Message ID | 20210914164206.19768-1-rpearsonhpe@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | RDMA/rxe: Various bug fixes. | expand |
Hi Bob, I can verify that rping works after applying this patch series. Thanks. Shoaib On 9/14/21 9:42 AM, Bob Pearson wrote: > This series of patches implements several bug fixes and minor > cleanups of the rxe driver. Specifically these fix a bug exposed > by blktest. > > They apply cleanly to both > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > The first patch is a rewrite of an earlier patch. > It adds memory barriers to kernel to kernel queues. The logic for this > is the same as an earlier patch that only treated user to kernel queues. > Without this patch kernel to kernel queues are expected to intermittently > fail at low frequency as was seen for the other queues. > > The second patch cleans up the state and type enums used by MRs. > > The third patch separates the keys in rxe_mr and ib_mr. This allows > the following sequence seen in the srp driver to work correctly. > > do { > ib_post_send( IB_WR_LOCAL_INV ) > ib_update_fast_reg_key() > ib_map_mr_sg() > ib_post_send( IB_WR_REG_MR ) > } while ( !done ) > > The fourth patch creates duplicate mapping tables for fast MRs. This > prevents rkeys referencing fast MRs from accessing data from an updated > map after the call to ib_map_mr_sg() call by keeping the new and old > mappings separate and atomically swapping them when a reg mr WR is > executed. > > The fifth patch checks the type of MRs which receive local or remote > invalidate operations to prevent invalidating user MRs. > > v3->v4: > Two of the patches in v3 were accepted in v5.15 so have been dropped > here. > > The first patch was rewritten to correctly deal with queue operations > in rxe_verbs.c where the code is the client and not the server. > > v2->v3: > The v2 version had a typo which broke clean application to for-next. > Additionally in v3 the order of the patches was changed to make > it a little cleaner. > > Bob Pearson (5): > RDMA/rxe: Add memory barriers to kernel queues > RDMA/rxe: Cleanup MR status and type enums > RDMA/rxe: Separate HW and SW l/rkeys > RDMA/rxe: Create duplicate mapping tables for FMRs > RDMA/rxe: Only allow invalidate for appropriate MRs > > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_cq.c | 25 +-- > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > drivers/infiniband/sw/rxe/rxe_mr.c | 267 ++++++++++++++++------- > drivers/infiniband/sw/rxe/rxe_mw.c | 36 ++-- > drivers/infiniband/sw/rxe/rxe_qp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_queue.c | 30 ++- > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++--------------- > drivers/infiniband/sw/rxe/rxe_req.c | 51 ++--- > drivers/infiniband/sw/rxe/rxe_resp.c | 40 +--- > drivers/infiniband/sw/rxe/rxe_srq.c | 2 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 92 ++------ > drivers/infiniband/sw/rxe/rxe_verbs.h | 48 ++--- > 13 files changed, 438 insertions(+), 471 deletions(-) >
On 9/14/21 7:07 PM, Shoaib Rao wrote: > Hi Bob, I can verify that rping works after applying this patch series. > > Thanks. > > Shoaib > > Thanks, I appreciate your quick response. Bob
Hi Bob With this patch serious, the blktests nvme-rdma still can be failed with the below error. and the test can be pass with siw. [ 1702.140090] loop0: detected capacity change from 0 to 2097152 [ 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420) [ 1702.158810] nvmet: creating controller 1 for subsystem blktests-subsystem-1 for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333. [ 1702.159037] nvme nvme0: creating 32 I/O queues. [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 [ 1702.178482] nvme nvme0: rdma connection establishment failed (-12) [ 1702.292261] eno2 speed is unknown, defaulting to 1000 [ 1702.297325] eno3 speed is unknown, defaulting to 1000 [ 1702.302389] eno4 speed is unknown, defaulting to 1000 [ 1702.317991] rdma_rxe: unloaded Failure from: /* * Currently we don't use SG_GAPS MR's so if the first entry is * misaligned we'll end up using two entries for a single data page, * so one additional entry is required. */ pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1; ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs, queue->queue_size, IB_MR_TYPE_MEM_REG, pages_per_mr, 0); if (ret) { dev_err(queue->ctrl->ctrl.device, "failed to initialize MR pool sized %d for QID %d\n", queue->queue_size, nvme_rdma_queue_idx(queue)); goto out_destroy_ring; } On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > This series of patches implements several bug fixes and minor > cleanups of the rxe driver. Specifically these fix a bug exposed > by blktest. > > They apply cleanly to both > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > The first patch is a rewrite of an earlier patch. > It adds memory barriers to kernel to kernel queues. The logic for this > is the same as an earlier patch that only treated user to kernel queues. > Without this patch kernel to kernel queues are expected to intermittently > fail at low frequency as was seen for the other queues. > > The second patch cleans up the state and type enums used by MRs. > > The third patch separates the keys in rxe_mr and ib_mr. This allows > the following sequence seen in the srp driver to work correctly. > > do { > ib_post_send( IB_WR_LOCAL_INV ) > ib_update_fast_reg_key() > ib_map_mr_sg() > ib_post_send( IB_WR_REG_MR ) > } while ( !done ) > > The fourth patch creates duplicate mapping tables for fast MRs. This > prevents rkeys referencing fast MRs from accessing data from an updated > map after the call to ib_map_mr_sg() call by keeping the new and old > mappings separate and atomically swapping them when a reg mr WR is > executed. > > The fifth patch checks the type of MRs which receive local or remote > invalidate operations to prevent invalidating user MRs. > > v3->v4: > Two of the patches in v3 were accepted in v5.15 so have been dropped > here. > > The first patch was rewritten to correctly deal with queue operations > in rxe_verbs.c where the code is the client and not the server. > > v2->v3: > The v2 version had a typo which broke clean application to for-next. > Additionally in v3 the order of the patches was changed to make > it a little cleaner. > > Bob Pearson (5): > RDMA/rxe: Add memory barriers to kernel queues > RDMA/rxe: Cleanup MR status and type enums > RDMA/rxe: Separate HW and SW l/rkeys > RDMA/rxe: Create duplicate mapping tables for FMRs > RDMA/rxe: Only allow invalidate for appropriate MRs > > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_cq.c | 25 +-- > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > drivers/infiniband/sw/rxe/rxe_mr.c | 267 ++++++++++++++++------- > drivers/infiniband/sw/rxe/rxe_mw.c | 36 ++-- > drivers/infiniband/sw/rxe/rxe_qp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_queue.c | 30 ++- > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++--------------- > drivers/infiniband/sw/rxe/rxe_req.c | 51 ++--- > drivers/infiniband/sw/rxe/rxe_resp.c | 40 +--- > drivers/infiniband/sw/rxe/rxe_srq.c | 2 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 92 ++------ > drivers/infiniband/sw/rxe/rxe_verbs.h | 48 ++--- > 13 files changed, 438 insertions(+), 471 deletions(-) > > -- > 2.30.2 >
On Tue, Sep 14, 2021 at 12:49 PM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > This series of patches implements several bug fixes and minor > cleanups of the rxe driver. Specifically these fix a bug exposed > by blktest. > > They apply cleanly to both > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > The first patch is a rewrite of an earlier patch. > It adds memory barriers to kernel to kernel queues. The logic for this > is the same as an earlier patch that only treated user to kernel queues. > Without this patch kernel to kernel queues are expected to intermittently > fail at low frequency as was seen for the other queues. > > The second patch cleans up the state and type enums used by MRs. > > The third patch separates the keys in rxe_mr and ib_mr. This allows > the following sequence seen in the srp driver to work correctly. > > do { > ib_post_send( IB_WR_LOCAL_INV ) > ib_update_fast_reg_key() > ib_map_mr_sg() > ib_post_send( IB_WR_REG_MR ) > } while ( !done ) > > The fourth patch creates duplicate mapping tables for fast MRs. This > prevents rkeys referencing fast MRs from accessing data from an updated > map after the call to ib_map_mr_sg() call by keeping the new and old > mappings separate and atomically swapping them when a reg mr WR is > executed. > > The fifth patch checks the type of MRs which receive local or remote > invalidate operations to prevent invalidating user MRs. > > v3->v4: > Two of the patches in v3 were accepted in v5.15 so have been dropped > here. > > The first patch was rewritten to correctly deal with queue operations > in rxe_verbs.c where the code is the client and not the server. > > v2->v3: > The v2 version had a typo which broke clean application to for-next. > Additionally in v3 the order of the patches was changed to make > it a little cleaner. Hi Bob, After applying these patches only top of 5.15-rc1. rping and mount NFSoRDMA works. Thank you. > > Bob Pearson (5): > RDMA/rxe: Add memory barriers to kernel queues > RDMA/rxe: Cleanup MR status and type enums > RDMA/rxe: Separate HW and SW l/rkeys > RDMA/rxe: Create duplicate mapping tables for FMRs > RDMA/rxe: Only allow invalidate for appropriate MRs > > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_cq.c | 25 +-- > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > drivers/infiniband/sw/rxe/rxe_mr.c | 267 ++++++++++++++++------- > drivers/infiniband/sw/rxe/rxe_mw.c | 36 ++-- > drivers/infiniband/sw/rxe/rxe_qp.c | 12 +- > drivers/infiniband/sw/rxe/rxe_queue.c | 30 ++- > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++--------------- > drivers/infiniband/sw/rxe/rxe_req.c | 51 ++--- > drivers/infiniband/sw/rxe/rxe_resp.c | 40 +--- > drivers/infiniband/sw/rxe/rxe_srq.c | 2 +- > drivers/infiniband/sw/rxe/rxe_verbs.c | 92 ++------ > drivers/infiniband/sw/rxe/rxe_verbs.h | 48 ++--- > 13 files changed, 438 insertions(+), 471 deletions(-) > > -- > 2.30.2 >
On Thu, Sep 23, 2021 at 02:49:54PM -0400, Olga Kornievskaia wrote: > On Tue, Sep 14, 2021 at 12:49 PM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > > This series of patches implements several bug fixes and minor > > cleanups of the rxe driver. Specifically these fix a bug exposed > > by blktest. > > > > They apply cleanly to both > > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > > > The first patch is a rewrite of an earlier patch. > > It adds memory barriers to kernel to kernel queues. The logic for this > > is the same as an earlier patch that only treated user to kernel queues. > > Without this patch kernel to kernel queues are expected to intermittently > > fail at low frequency as was seen for the other queues. > > > > The second patch cleans up the state and type enums used by MRs. > > > > The third patch separates the keys in rxe_mr and ib_mr. This allows > > the following sequence seen in the srp driver to work correctly. > > > > do { > > ib_post_send( IB_WR_LOCAL_INV ) > > ib_update_fast_reg_key() > > ib_map_mr_sg() > > ib_post_send( IB_WR_REG_MR ) > > } while ( !done ) > > > > The fourth patch creates duplicate mapping tables for fast MRs. This > > prevents rkeys referencing fast MRs from accessing data from an updated > > map after the call to ib_map_mr_sg() call by keeping the new and old > > mappings separate and atomically swapping them when a reg mr WR is > > executed. > > > > The fifth patch checks the type of MRs which receive local or remote > > invalidate operations to prevent invalidating user MRs. > > > > v3->v4: > > Two of the patches in v3 were accepted in v5.15 so have been dropped > > here. > > > > The first patch was rewritten to correctly deal with queue operations > > in rxe_verbs.c where the code is the client and not the server. > > > > v2->v3: > > The v2 version had a typo which broke clean application to for-next. > > Additionally in v3 the order of the patches was changed to make > > it a little cleaner. > > Hi Bob, > > After applying these patches only top of 5.15-rc1. rping and mount > NFSoRDMA works. I've lost track with all the mails, are we good on this series now or was there still more fixing needed? Jason
>> >> Hi Bob, >> >> After applying these patches only top of 5.15-rc1. rping and mount >> NFSoRDMA works. > > I've lost track with all the mails, are we good on this series now or > was there still more fixing needed? > > Jason > There is one remaining report of a problem from Yi Zhang, on 9/17/21, who sees an error in blktest. Unfortunately I am unable to figure out how to reproduce the error and the code he shows is not directly in rxe. I suspect it may be related to resource limits if anything which has nothing to do with the patch series. I believe they are all good. Bob
On Tue, Sep 14, 2021 at 11:42:02AM -0500, Bob Pearson wrote: > This series of patches implements several bug fixes and minor > cleanups of the rxe driver. Specifically these fix a bug exposed > by blktest. > > They apply cleanly to both > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > The first patch is a rewrite of an earlier patch. > It adds memory barriers to kernel to kernel queues. The logic for this > is the same as an earlier patch that only treated user to kernel queues. > Without this patch kernel to kernel queues are expected to intermittently > fail at low frequency as was seen for the other queues. > > The second patch cleans up the state and type enums used by MRs. > > The third patch separates the keys in rxe_mr and ib_mr. This allows > the following sequence seen in the srp driver to work correctly. > > do { > ib_post_send( IB_WR_LOCAL_INV ) > ib_update_fast_reg_key() > ib_map_mr_sg() > ib_post_send( IB_WR_REG_MR ) > } while ( !done ) > > The fourth patch creates duplicate mapping tables for fast MRs. This > prevents rkeys referencing fast MRs from accessing data from an updated > map after the call to ib_map_mr_sg() call by keeping the new and old > mappings separate and atomically swapping them when a reg mr WR is > executed. > > The fifth patch checks the type of MRs which receive local or remote > invalidate operations to prevent invalidating user MRs. > > v3->v4: > Two of the patches in v3 were accepted in v5.15 so have been dropped > here. > > The first patch was rewritten to correctly deal with queue operations > in rxe_verbs.c where the code is the client and not the server. > > v2->v3: > The v2 version had a typo which broke clean application to for-next. > Additionally in v3 the order of the patches was changed to make > it a little cleaner. > > Bob Pearson (5): > RDMA/rxe: Add memory barriers to kernel queues > RDMA/rxe: Cleanup MR status and type enums > RDMA/rxe: Separate HW and SW l/rkeys > RDMA/rxe: Create duplicate mapping tables for FMRs > RDMA/rxe: Only allow invalidate for appropriate MRs applied to for-next, this is a little too complicated for for-rc, and no fixes lines Thanks, Jason
On 9/24/21 6:54 AM, Jason Gunthorpe wrote: > On Tue, Sep 14, 2021 at 11:42:02AM -0500, Bob Pearson wrote: >> This series of patches implements several bug fixes and minor >> cleanups of the rxe driver. Specifically these fix a bug exposed >> by blktest. >> >> They apply cleanly to both >> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) >> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) >> >> The first patch is a rewrite of an earlier patch. >> It adds memory barriers to kernel to kernel queues. The logic for this >> is the same as an earlier patch that only treated user to kernel queues. >> Without this patch kernel to kernel queues are expected to intermittently >> fail at low frequency as was seen for the other queues. >> >> The second patch cleans up the state and type enums used by MRs. >> >> The third patch separates the keys in rxe_mr and ib_mr. This allows >> the following sequence seen in the srp driver to work correctly. >> >> do { >> ib_post_send( IB_WR_LOCAL_INV ) >> ib_update_fast_reg_key() >> ib_map_mr_sg() >> ib_post_send( IB_WR_REG_MR ) >> } while ( !done ) >> >> The fourth patch creates duplicate mapping tables for fast MRs. This >> prevents rkeys referencing fast MRs from accessing data from an updated >> map after the call to ib_map_mr_sg() call by keeping the new and old >> mappings separate and atomically swapping them when a reg mr WR is >> executed. >> >> The fifth patch checks the type of MRs which receive local or remote >> invalidate operations to prevent invalidating user MRs. >> >> v3->v4: >> Two of the patches in v3 were accepted in v5.15 so have been dropped >> here. >> >> The first patch was rewritten to correctly deal with queue operations >> in rxe_verbs.c where the code is the client and not the server. >> >> v2->v3: >> The v2 version had a typo which broke clean application to for-next. >> Additionally in v3 the order of the patches was changed to make >> it a little cleaner. >> >> Bob Pearson (5): >> RDMA/rxe: Add memory barriers to kernel queues >> RDMA/rxe: Cleanup MR status and type enums >> RDMA/rxe: Separate HW and SW l/rkeys >> RDMA/rxe: Create duplicate mapping tables for FMRs >> RDMA/rxe: Only allow invalidate for appropriate MRs > applied to for-next, this is a little too complicated for for-rc, and > no fixes lines > > Thanks, > Jason Jason can you pull in my patch also. Thanks, Shoaib
Hi Bob/Sagi Regarding the blktests nvme-rdma failure with rdma_rxe, I found the patch[3] reduce the number of MRs and MWs to 4K from 256K, which lead the nvme connect failed[1], if I change the "number of io queues to use" to 31, it will works[2], and the default io queues num is the CPU core count(mine is 48), that's why it failed on my environment. I also have tried revert patch[2], and the blktests run successfully on my server, could we change back to 256K again? [1] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn -i 32 Failed to write to /dev/nvme-fabrics: Cannot allocate memory # dmesg [ 857.546927] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732. [ 857.547640] nvme nvme0: creating 32 I/O queues. [ 857.595285] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 [ 857.602141] nvme nvme0: rdma connection establishment failed (-12) [2] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn -i 31 # dmesg [ 863.792874] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732. [ 863.793638] nvme nvme0: creating 31 I/O queues. [ 863.845594] nvme nvme0: mapped 31/0/0 default/read/poll queues. [ 863.853633] nvme nvme0: new ctrl: NQN "testnqn", addr 10.16.221.74:4420 [3] commit af732adfacb2c6d886713624af2ff8e555c32aa4 Author: Bob Pearson <rpearsonhpe@gmail.com> Date: Mon Jun 7 23:25:46 2021 -0500 RDMA/rxe: Enable MW object pool Currently the rxe driver has a rxe_mw struct object but nothing about memory windows is enabled. This patch turns on memory windows and some minor cleanup. Set device attribute in rxe.c so max_mw = MAX_MW. Change parameters in rxe_param.h so that MAX_MW is the same as MAX_MR. Reduce the number of MRs and MWs to 4K from 256K. Add device capability bits for 2a and 2b memory windows. Removed RXE_MR_TYPE_MW from the rxe_mr_type enum. On Sat, Sep 18, 2021 at 10:13 AM Yi Zhang <yi.zhang@redhat.com> wrote: > > Hi Bob > With this patch serious, the blktests nvme-rdma still can be failed > with the below error. and the test can be pass with siw. > > [ 1702.140090] loop0: detected capacity change from 0 to 2097152 > [ 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 > [ 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420) > [ 1702.158810] nvmet: creating controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333. > [ 1702.159037] nvme nvme0: creating 32 I/O queues. > [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 > [ 1702.178482] nvme nvme0: rdma connection establishment failed (-12) > [ 1702.292261] eno2 speed is unknown, defaulting to 1000 > [ 1702.297325] eno3 speed is unknown, defaulting to 1000 > [ 1702.302389] eno4 speed is unknown, defaulting to 1000 > [ 1702.317991] rdma_rxe: unloaded > > Failure from: > /* > * Currently we don't use SG_GAPS MR's so if the first entry is > * misaligned we'll end up using two entries for a single data page, > * so one additional entry is required. > */ > pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1; > ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs, > queue->queue_size, > IB_MR_TYPE_MEM_REG, > pages_per_mr, 0); > if (ret) { > dev_err(queue->ctrl->ctrl.device, > "failed to initialize MR pool sized %d for QID %d\n", > queue->queue_size, nvme_rdma_queue_idx(queue)); > goto out_destroy_ring; > } > > > On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > > This series of patches implements several bug fixes and minor > > cleanups of the rxe driver. Specifically these fix a bug exposed > > by blktest. > > > > They apply cleanly to both > > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > > > The first patch is a rewrite of an earlier patch. > > It adds memory barriers to kernel to kernel queues. The logic for this > > is the same as an earlier patch that only treated user to kernel queues. > > Without this patch kernel to kernel queues are expected to intermittently > > fail at low frequency as was seen for the other queues. > > > > The second patch cleans up the state and type enums used by MRs. > > > > The third patch separates the keys in rxe_mr and ib_mr. This allows > > the following sequence seen in the srp driver to work correctly. > > > > do { > > ib_post_send( IB_WR_LOCAL_INV ) > > ib_update_fast_reg_key() > > ib_map_mr_sg() > > ib_post_send( IB_WR_REG_MR ) > > } while ( !done ) > > > > The fourth patch creates duplicate mapping tables for fast MRs. This > > prevents rkeys referencing fast MRs from accessing data from an updated > > map after the call to ib_map_mr_sg() call by keeping the new and old > > mappings separate and atomically swapping them when a reg mr WR is > > executed. > > > > The fifth patch checks the type of MRs which receive local or remote > > invalidate operations to prevent invalidating user MRs. > > > > v3->v4: > > Two of the patches in v3 were accepted in v5.15 so have been dropped > > here. > > > > The first patch was rewritten to correctly deal with queue operations > > in rxe_verbs.c where the code is the client and not the server. > > > > v2->v3: > > The v2 version had a typo which broke clean application to for-next. > > Additionally in v3 the order of the patches was changed to make > > it a little cleaner. > > > > Bob Pearson (5): > > RDMA/rxe: Add memory barriers to kernel queues > > RDMA/rxe: Cleanup MR status and type enums > > RDMA/rxe: Separate HW and SW l/rkeys > > RDMA/rxe: Create duplicate mapping tables for FMRs > > RDMA/rxe: Only allow invalidate for appropriate MRs > > > > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > > drivers/infiniband/sw/rxe/rxe_cq.c | 25 +-- > > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > > drivers/infiniband/sw/rxe/rxe_mr.c | 267 ++++++++++++++++------- > > drivers/infiniband/sw/rxe/rxe_mw.c | 36 ++-- > > drivers/infiniband/sw/rxe/rxe_qp.c | 12 +- > > drivers/infiniband/sw/rxe/rxe_queue.c | 30 ++- > > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++--------------- > > drivers/infiniband/sw/rxe/rxe_req.c | 51 ++--- > > drivers/infiniband/sw/rxe/rxe_resp.c | 40 +--- > > drivers/infiniband/sw/rxe/rxe_srq.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_verbs.c | 92 ++------ > > drivers/infiniband/sw/rxe/rxe_verbs.h | 48 ++--- > > 13 files changed, 438 insertions(+), 471 deletions(-) > > > > -- > > 2.30.2 > > > > > -- > Best Regards, > Yi Zhang
Sorry that was my fault. 256K seemed way too big but 4K is too small. This whole area has been problematic lately. There are several use cases for rxe and one is to be an easy test target which is hard if you can't reach the limits. On the other hand if you want to really use it you just want the limits to get out of the way. One way to fix this would be to add a bunch of module parameters but there are too many attributes to be practical. Another would be to have 2 or 3 attribute sets (large, medium, small) that you could pick with one module parameter. Or, you could have a configuration file somewhere that the driver could read but seems messy. Or you could pass strings to one module parameter that could be parsed to set the attributes. If anyone has a preference I'd be happy to write something. Bob -----Original Message----- From: Yi Zhang <yi.zhang@redhat.com> Sent: Saturday, September 25, 2021 2:56 AM To: Bob Pearson <rpearsonhpe@gmail.com>; Sagi Grimberg <sagi@grimberg.me> Cc: Jason Gunthorpe <jgg@nvidia.com>; Zhu Yanjun <zyjzyj2000@gmail.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; Bart Van Assche <bvanassche@acm.org>; mie@igel.co.jp; rao.shoaib@oracle.com Subject: Re: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes. Hi Bob/Sagi Regarding the blktests nvme-rdma failure with rdma_rxe, I found the patch[3] reduce the number of MRs and MWs to 4K from 256K, which lead the nvme connect failed[1], if I change the "number of io queues to use" to 31, it will works[2], and the default io queues num is the CPU core count(mine is 48), that's why it failed on my environment. I also have tried revert patch[2], and the blktests run successfully on my server, could we change back to 256K again? [1] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn -i 32 Failed to write to /dev/nvme-fabrics: Cannot allocate memory # dmesg [ 857.546927] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732. [ 857.547640] nvme nvme0: creating 32 I/O queues. [ 857.595285] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 [ 857.602141] nvme nvme0: rdma connection establishment failed (-12) [2] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn -i 31 # dmesg [ 863.792874] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732. [ 863.793638] nvme nvme0: creating 31 I/O queues. [ 863.845594] nvme nvme0: mapped 31/0/0 default/read/poll queues. [ 863.853633] nvme nvme0: new ctrl: NQN "testnqn", addr 10.16.221.74:4420 [3] commit af732adfacb2c6d886713624af2ff8e555c32aa4 Author: Bob Pearson <rpearsonhpe@gmail.com> Date: Mon Jun 7 23:25:46 2021 -0500 RDMA/rxe: Enable MW object pool Currently the rxe driver has a rxe_mw struct object but nothing about memory windows is enabled. This patch turns on memory windows and some minor cleanup. Set device attribute in rxe.c so max_mw = MAX_MW. Change parameters in rxe_param.h so that MAX_MW is the same as MAX_MR. Reduce the number of MRs and MWs to 4K from 256K. Add device capability bits for 2a and 2b memory windows. Removed RXE_MR_TYPE_MW from the rxe_mr_type enum. On Sat, Sep 18, 2021 at 10:13 AM Yi Zhang <yi.zhang@redhat.com> wrote: > > Hi Bob > With this patch serious, the blktests nvme-rdma still can be failed > with the below error. and the test can be pass with siw. > > [ 1702.140090] loop0: detected capacity change from 0 to 2097152 [ > 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ > 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420) [ > 1702.158810] nvmet: creating controller 1 for subsystem > blktests-subsystem-1 for NQN > nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333. > [ 1702.159037] nvme nvme0: creating 32 I/O queues. > [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for > QID 32 [ 1702.178482] nvme nvme0: rdma connection establishment failed > (-12) [ 1702.292261] eno2 speed is unknown, defaulting to 1000 [ > 1702.297325] eno3 speed is unknown, defaulting to 1000 [ 1702.302389] > eno4 speed is unknown, defaulting to 1000 [ 1702.317991] rdma_rxe: > unloaded > > Failure from: > /* > * Currently we don't use SG_GAPS MR's so if the first entry is > * misaligned we'll end up using two entries for a single data page, > * so one additional entry is required. > */ > pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1; > ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs, > queue->queue_size, > IB_MR_TYPE_MEM_REG, > pages_per_mr, 0); > if (ret) { > dev_err(queue->ctrl->ctrl.device, > "failed to initialize MR pool sized %d for QID %d\n", > queue->queue_size, nvme_rdma_queue_idx(queue)); > goto out_destroy_ring; > } > > > On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote: > > > > This series of patches implements several bug fixes and minor > > cleanups of the rxe driver. Specifically these fix a bug exposed by > > blktest. > > > > They apply cleanly to both > > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) > > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next) > > > > The first patch is a rewrite of an earlier patch. > > It adds memory barriers to kernel to kernel queues. The logic for > > this is the same as an earlier patch that only treated user to kernel queues. > > Without this patch kernel to kernel queues are expected to > > intermittently fail at low frequency as was seen for the other queues. > > > > The second patch cleans up the state and type enums used by MRs. > > > > The third patch separates the keys in rxe_mr and ib_mr. This allows > > the following sequence seen in the srp driver to work correctly. > > > > do { > > ib_post_send( IB_WR_LOCAL_INV ) > > ib_update_fast_reg_key() > > ib_map_mr_sg() > > ib_post_send( IB_WR_REG_MR ) > > } while ( !done ) > > > > The fourth patch creates duplicate mapping tables for fast MRs. This > > prevents rkeys referencing fast MRs from accessing data from an > > updated map after the call to ib_map_mr_sg() call by keeping the new > > and old mappings separate and atomically swapping them when a reg mr > > WR is executed. > > > > The fifth patch checks the type of MRs which receive local or remote > > invalidate operations to prevent invalidating user MRs. > > > > v3->v4: > > Two of the patches in v3 were accepted in v5.15 so have been dropped > > here. > > > > The first patch was rewritten to correctly deal with queue > > operations in rxe_verbs.c where the code is the client and not the server. > > > > v2->v3: > > The v2 version had a typo which broke clean application to for-next. > > Additionally in v3 the order of the patches was changed to make it a > > little cleaner. > > > > Bob Pearson (5): > > RDMA/rxe: Add memory barriers to kernel queues > > RDMA/rxe: Cleanup MR status and type enums > > RDMA/rxe: Separate HW and SW l/rkeys > > RDMA/rxe: Create duplicate mapping tables for FMRs > > RDMA/rxe: Only allow invalidate for appropriate MRs > > > > drivers/infiniband/sw/rxe/rxe_comp.c | 12 +- > > drivers/infiniband/sw/rxe/rxe_cq.c | 25 +-- > > drivers/infiniband/sw/rxe/rxe_loc.h | 2 + > > drivers/infiniband/sw/rxe/rxe_mr.c | 267 ++++++++++++++++------- > > drivers/infiniband/sw/rxe/rxe_mw.c | 36 ++-- > > drivers/infiniband/sw/rxe/rxe_qp.c | 12 +- > > drivers/infiniband/sw/rxe/rxe_queue.c | 30 ++- > > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++--------------- > > drivers/infiniband/sw/rxe/rxe_req.c | 51 ++--- > > drivers/infiniband/sw/rxe/rxe_resp.c | 40 +--- > > drivers/infiniband/sw/rxe/rxe_srq.c | 2 +- > > drivers/infiniband/sw/rxe/rxe_verbs.c | 92 ++------ > > drivers/infiniband/sw/rxe/rxe_verbs.h | 48 ++--- > > 13 files changed, 438 insertions(+), 471 deletions(-) > > > > -- > > 2.30.2 > > > > > -- > Best Regards, > Yi Zhang -- Best Regards, Yi Zhang