mbox series

[for-rc,v4,0/5] RDMA/rxe: Various bug fixes.

Message ID 20210914164206.19768-1-rpearsonhpe@gmail.com (mailing list archive)
Headers show
Series RDMA/rxe: Various bug fixes. | expand

Message

Bob Pearson Sept. 14, 2021, 4:42 p.m. UTC
This series of patches implements several bug fixes and minor
cleanups of the rxe driver. Specifically these fix a bug exposed
by blktest.

They apply cleanly to both
commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)

The first patch is a rewrite of an earlier patch.
It adds memory barriers to kernel to kernel queues. The logic for this
is the same as an earlier patch that only treated user to kernel queues.
Without this patch kernel to kernel queues are expected to intermittently
fail at low frequency as was seen for the other queues.

The second patch cleans up the state and type enums used by MRs.

The third patch separates the keys in rxe_mr and ib_mr. This allows
the following sequence seen in the srp driver to work correctly.

	do {
		ib_post_send( IB_WR_LOCAL_INV )
		ib_update_fast_reg_key()
		ib_map_mr_sg()
		ib_post_send( IB_WR_REG_MR )
	} while ( !done )

The fourth patch creates duplicate mapping tables for fast MRs. This
prevents rkeys referencing fast MRs from accessing data from an updated
map after the call to ib_map_mr_sg() call by keeping the new and old
mappings separate and atomically swapping them when a reg mr WR is
executed.

The fifth patch checks the type of MRs which receive local or remote
invalidate operations to prevent invalidating user MRs.

v3->v4:
Two of the patches in v3 were accepted in v5.15 so have been dropped
here.

The first patch was rewritten to correctly deal with queue operations
in rxe_verbs.c where the code is the client and not the server.

v2->v3:
The v2 version had a typo which broke clean application to for-next.
Additionally in v3 the order of the patches was changed to make
it a little cleaner.

Bob Pearson (5):
  RDMA/rxe: Add memory barriers to kernel queues
  RDMA/rxe: Cleanup MR status and type enums
  RDMA/rxe: Separate HW and SW l/rkeys
  RDMA/rxe: Create duplicate mapping tables for FMRs
  RDMA/rxe: Only allow invalidate for appropriate MRs

 drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
 drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
 drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
 drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
 drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
 drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
 drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
 drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
 drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
 drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
 drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
 drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
 drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
 13 files changed, 438 insertions(+), 471 deletions(-)

Comments

Shoaib Rao Sept. 15, 2021, 12:07 a.m. UTC | #1
Hi Bob, I can verify that rping works after applying this patch series.

Thanks.

Shoaib


On 9/14/21 9:42 AM, Bob Pearson wrote:
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
>
> They apply cleanly to both
> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
>
> The first patch is a rewrite of an earlier patch.
> It adds memory barriers to kernel to kernel queues. The logic for this
> is the same as an earlier patch that only treated user to kernel queues.
> Without this patch kernel to kernel queues are expected to intermittently
> fail at low frequency as was seen for the other queues.
>
> The second patch cleans up the state and type enums used by MRs.
>
> The third patch separates the keys in rxe_mr and ib_mr. This allows
> the following sequence seen in the srp driver to work correctly.
>
> 	do {
> 		ib_post_send( IB_WR_LOCAL_INV )
> 		ib_update_fast_reg_key()
> 		ib_map_mr_sg()
> 		ib_post_send( IB_WR_REG_MR )
> 	} while ( !done )
>
> The fourth patch creates duplicate mapping tables for fast MRs. This
> prevents rkeys referencing fast MRs from accessing data from an updated
> map after the call to ib_map_mr_sg() call by keeping the new and old
> mappings separate and atomically swapping them when a reg mr WR is
> executed.
>
> The fifth patch checks the type of MRs which receive local or remote
> invalidate operations to prevent invalidating user MRs.
>
> v3->v4:
> Two of the patches in v3 were accepted in v5.15 so have been dropped
> here.
>
> The first patch was rewritten to correctly deal with queue operations
> in rxe_verbs.c where the code is the client and not the server.
>
> v2->v3:
> The v2 version had a typo which broke clean application to for-next.
> Additionally in v3 the order of the patches was changed to make
> it a little cleaner.
>
> Bob Pearson (5):
>    RDMA/rxe: Add memory barriers to kernel queues
>    RDMA/rxe: Cleanup MR status and type enums
>    RDMA/rxe: Separate HW and SW l/rkeys
>    RDMA/rxe: Create duplicate mapping tables for FMRs
>    RDMA/rxe: Only allow invalidate for appropriate MRs
>
>   drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
>   drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
>   drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
>   drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
>   drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
>   drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
>   drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
>   drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
>   drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
>   drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
>   drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
>   drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
>   drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
>   13 files changed, 438 insertions(+), 471 deletions(-)
>
Bob Pearson Sept. 15, 2021, 12:58 a.m. UTC | #2
On 9/14/21 7:07 PM, Shoaib Rao wrote:
> Hi Bob, I can verify that rping works after applying this patch series.
> 
> Thanks.
> 
> Shoaib
> 
> 
Thanks,

I appreciate your quick response.

Bob
Yi Zhang Sept. 18, 2021, 2:13 a.m. UTC | #3
Hi Bob
With this patch serious, the blktests nvme-rdma still can be failed
with the below error. and the test can be pass with siw.

[ 1702.140090] loop0: detected capacity change from 0 to 2097152
[ 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
[ 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420)
[ 1702.158810] nvmet: creating controller 1 for subsystem
blktests-subsystem-1 for NQN
nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
[ 1702.159037] nvme nvme0: creating 32 I/O queues.
[ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for QID 32
[ 1702.178482] nvme nvme0: rdma connection establishment failed (-12)
[ 1702.292261] eno2 speed is unknown, defaulting to 1000
[ 1702.297325] eno3 speed is unknown, defaulting to 1000
[ 1702.302389] eno4 speed is unknown, defaulting to 1000
[ 1702.317991] rdma_rxe: unloaded

Failure from:
        /*
         * Currently we don't use SG_GAPS MR's so if the first entry is
         * misaligned we'll end up using two entries for a single data page,
         * so one additional entry is required.
         */
        pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
        ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
                              queue->queue_size,
                              IB_MR_TYPE_MEM_REG,
                              pages_per_mr, 0);
        if (ret) {
                dev_err(queue->ctrl->ctrl.device,
                        "failed to initialize MR pool sized %d for QID %d\n",
                        queue->queue_size, nvme_rdma_queue_idx(queue));
                goto out_destroy_ring;
        }


On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
>
> They apply cleanly to both
> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
>
> The first patch is a rewrite of an earlier patch.
> It adds memory barriers to kernel to kernel queues. The logic for this
> is the same as an earlier patch that only treated user to kernel queues.
> Without this patch kernel to kernel queues are expected to intermittently
> fail at low frequency as was seen for the other queues.
>
> The second patch cleans up the state and type enums used by MRs.
>
> The third patch separates the keys in rxe_mr and ib_mr. This allows
> the following sequence seen in the srp driver to work correctly.
>
>         do {
>                 ib_post_send( IB_WR_LOCAL_INV )
>                 ib_update_fast_reg_key()
>                 ib_map_mr_sg()
>                 ib_post_send( IB_WR_REG_MR )
>         } while ( !done )
>
> The fourth patch creates duplicate mapping tables for fast MRs. This
> prevents rkeys referencing fast MRs from accessing data from an updated
> map after the call to ib_map_mr_sg() call by keeping the new and old
> mappings separate and atomically swapping them when a reg mr WR is
> executed.
>
> The fifth patch checks the type of MRs which receive local or remote
> invalidate operations to prevent invalidating user MRs.
>
> v3->v4:
> Two of the patches in v3 were accepted in v5.15 so have been dropped
> here.
>
> The first patch was rewritten to correctly deal with queue operations
> in rxe_verbs.c where the code is the client and not the server.
>
> v2->v3:
> The v2 version had a typo which broke clean application to for-next.
> Additionally in v3 the order of the patches was changed to make
> it a little cleaner.
>
> Bob Pearson (5):
>   RDMA/rxe: Add memory barriers to kernel queues
>   RDMA/rxe: Cleanup MR status and type enums
>   RDMA/rxe: Separate HW and SW l/rkeys
>   RDMA/rxe: Create duplicate mapping tables for FMRs
>   RDMA/rxe: Only allow invalidate for appropriate MRs
>
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
>  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
>  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
>  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
>  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
>  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
>  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
>  drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
>  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
>  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
>  drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
>  13 files changed, 438 insertions(+), 471 deletions(-)
>
> --
> 2.30.2
>
Olga Kornievskaia Sept. 23, 2021, 6:49 p.m. UTC | #4
On Tue, Sep 14, 2021 at 12:49 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
>
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
>
> They apply cleanly to both
> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
>
> The first patch is a rewrite of an earlier patch.
> It adds memory barriers to kernel to kernel queues. The logic for this
> is the same as an earlier patch that only treated user to kernel queues.
> Without this patch kernel to kernel queues are expected to intermittently
> fail at low frequency as was seen for the other queues.
>
> The second patch cleans up the state and type enums used by MRs.
>
> The third patch separates the keys in rxe_mr and ib_mr. This allows
> the following sequence seen in the srp driver to work correctly.
>
>         do {
>                 ib_post_send( IB_WR_LOCAL_INV )
>                 ib_update_fast_reg_key()
>                 ib_map_mr_sg()
>                 ib_post_send( IB_WR_REG_MR )
>         } while ( !done )
>
> The fourth patch creates duplicate mapping tables for fast MRs. This
> prevents rkeys referencing fast MRs from accessing data from an updated
> map after the call to ib_map_mr_sg() call by keeping the new and old
> mappings separate and atomically swapping them when a reg mr WR is
> executed.
>
> The fifth patch checks the type of MRs which receive local or remote
> invalidate operations to prevent invalidating user MRs.
>
> v3->v4:
> Two of the patches in v3 were accepted in v5.15 so have been dropped
> here.
>
> The first patch was rewritten to correctly deal with queue operations
> in rxe_verbs.c where the code is the client and not the server.
>
> v2->v3:
> The v2 version had a typo which broke clean application to for-next.
> Additionally in v3 the order of the patches was changed to make
> it a little cleaner.

Hi Bob,

After applying these patches only top of 5.15-rc1. rping and mount
NFSoRDMA works.

Thank you.

>
> Bob Pearson (5):
>   RDMA/rxe: Add memory barriers to kernel queues
>   RDMA/rxe: Cleanup MR status and type enums
>   RDMA/rxe: Separate HW and SW l/rkeys
>   RDMA/rxe: Create duplicate mapping tables for FMRs
>   RDMA/rxe: Only allow invalidate for appropriate MRs
>
>  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
>  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
>  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
>  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
>  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
>  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
>  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
>  drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
>  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
>  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
>  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
>  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
>  drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
>  13 files changed, 438 insertions(+), 471 deletions(-)
>
> --
> 2.30.2
>
Jason Gunthorpe Sept. 23, 2021, 7:56 p.m. UTC | #5
On Thu, Sep 23, 2021 at 02:49:54PM -0400, Olga Kornievskaia wrote:
> On Tue, Sep 14, 2021 at 12:49 PM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This series of patches implements several bug fixes and minor
> > cleanups of the rxe driver. Specifically these fix a bug exposed
> > by blktest.
> >
> > They apply cleanly to both
> > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
> >
> > The first patch is a rewrite of an earlier patch.
> > It adds memory barriers to kernel to kernel queues. The logic for this
> > is the same as an earlier patch that only treated user to kernel queues.
> > Without this patch kernel to kernel queues are expected to intermittently
> > fail at low frequency as was seen for the other queues.
> >
> > The second patch cleans up the state and type enums used by MRs.
> >
> > The third patch separates the keys in rxe_mr and ib_mr. This allows
> > the following sequence seen in the srp driver to work correctly.
> >
> >         do {
> >                 ib_post_send( IB_WR_LOCAL_INV )
> >                 ib_update_fast_reg_key()
> >                 ib_map_mr_sg()
> >                 ib_post_send( IB_WR_REG_MR )
> >         } while ( !done )
> >
> > The fourth patch creates duplicate mapping tables for fast MRs. This
> > prevents rkeys referencing fast MRs from accessing data from an updated
> > map after the call to ib_map_mr_sg() call by keeping the new and old
> > mappings separate and atomically swapping them when a reg mr WR is
> > executed.
> >
> > The fifth patch checks the type of MRs which receive local or remote
> > invalidate operations to prevent invalidating user MRs.
> >
> > v3->v4:
> > Two of the patches in v3 were accepted in v5.15 so have been dropped
> > here.
> >
> > The first patch was rewritten to correctly deal with queue operations
> > in rxe_verbs.c where the code is the client and not the server.
> >
> > v2->v3:
> > The v2 version had a typo which broke clean application to for-next.
> > Additionally in v3 the order of the patches was changed to make
> > it a little cleaner.
> 
> Hi Bob,
> 
> After applying these patches only top of 5.15-rc1. rping and mount
> NFSoRDMA works.

I've lost track with all the mails, are we good on this series now or
was there still more fixing needed?

Jason
Bob Pearson Sept. 24, 2021, 1:25 a.m. UTC | #6
>>
>> Hi Bob,
>>
>> After applying these patches only top of 5.15-rc1. rping and mount
>> NFSoRDMA works.
> 
> I've lost track with all the mails, are we good on this series now or
> was there still more fixing needed?
> 
> Jason
> 

There is one remaining report of a problem from Yi Zhang, on 9/17/21, who sees an error in blktest.
Unfortunately I am unable to figure out how to reproduce the error and the code he shows is not directly
in rxe. I suspect it may be related to resource limits if anything which has nothing to do with
the patch series. I believe they are all good.

Bob
Jason Gunthorpe Sept. 24, 2021, 1:54 p.m. UTC | #7
On Tue, Sep 14, 2021 at 11:42:02AM -0500, Bob Pearson wrote:
> This series of patches implements several bug fixes and minor
> cleanups of the rxe driver. Specifically these fix a bug exposed
> by blktest.
> 
> They apply cleanly to both
> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
> 
> The first patch is a rewrite of an earlier patch.
> It adds memory barriers to kernel to kernel queues. The logic for this
> is the same as an earlier patch that only treated user to kernel queues.
> Without this patch kernel to kernel queues are expected to intermittently
> fail at low frequency as was seen for the other queues.
> 
> The second patch cleans up the state and type enums used by MRs.
> 
> The third patch separates the keys in rxe_mr and ib_mr. This allows
> the following sequence seen in the srp driver to work correctly.
> 
> 	do {
> 		ib_post_send( IB_WR_LOCAL_INV )
> 		ib_update_fast_reg_key()
> 		ib_map_mr_sg()
> 		ib_post_send( IB_WR_REG_MR )
> 	} while ( !done )
> 
> The fourth patch creates duplicate mapping tables for fast MRs. This
> prevents rkeys referencing fast MRs from accessing data from an updated
> map after the call to ib_map_mr_sg() call by keeping the new and old
> mappings separate and atomically swapping them when a reg mr WR is
> executed.
> 
> The fifth patch checks the type of MRs which receive local or remote
> invalidate operations to prevent invalidating user MRs.
> 
> v3->v4:
> Two of the patches in v3 were accepted in v5.15 so have been dropped
> here.
> 
> The first patch was rewritten to correctly deal with queue operations
> in rxe_verbs.c where the code is the client and not the server.
> 
> v2->v3:
> The v2 version had a typo which broke clean application to for-next.
> Additionally in v3 the order of the patches was changed to make
> it a little cleaner.
> 
> Bob Pearson (5):
>   RDMA/rxe: Add memory barriers to kernel queues
>   RDMA/rxe: Cleanup MR status and type enums
>   RDMA/rxe: Separate HW and SW l/rkeys
>   RDMA/rxe: Create duplicate mapping tables for FMRs
>   RDMA/rxe: Only allow invalidate for appropriate MRs

applied to for-next, this is a little too complicated for for-rc, and
no fixes lines

Thanks,
Jason
Shoaib Rao Sept. 24, 2021, 3:44 p.m. UTC | #8
On 9/24/21 6:54 AM, Jason Gunthorpe wrote:
> On Tue, Sep 14, 2021 at 11:42:02AM -0500, Bob Pearson wrote:
>> This series of patches implements several bug fixes and minor
>> cleanups of the rxe driver. Specifically these fix a bug exposed
>> by blktest.
>>
>> They apply cleanly to both
>> commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
>> commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
>>
>> The first patch is a rewrite of an earlier patch.
>> It adds memory barriers to kernel to kernel queues. The logic for this
>> is the same as an earlier patch that only treated user to kernel queues.
>> Without this patch kernel to kernel queues are expected to intermittently
>> fail at low frequency as was seen for the other queues.
>>
>> The second patch cleans up the state and type enums used by MRs.
>>
>> The third patch separates the keys in rxe_mr and ib_mr. This allows
>> the following sequence seen in the srp driver to work correctly.
>>
>> 	do {
>> 		ib_post_send( IB_WR_LOCAL_INV )
>> 		ib_update_fast_reg_key()
>> 		ib_map_mr_sg()
>> 		ib_post_send( IB_WR_REG_MR )
>> 	} while ( !done )
>>
>> The fourth patch creates duplicate mapping tables for fast MRs. This
>> prevents rkeys referencing fast MRs from accessing data from an updated
>> map after the call to ib_map_mr_sg() call by keeping the new and old
>> mappings separate and atomically swapping them when a reg mr WR is
>> executed.
>>
>> The fifth patch checks the type of MRs which receive local or remote
>> invalidate operations to prevent invalidating user MRs.
>>
>> v3->v4:
>> Two of the patches in v3 were accepted in v5.15 so have been dropped
>> here.
>>
>> The first patch was rewritten to correctly deal with queue operations
>> in rxe_verbs.c where the code is the client and not the server.
>>
>> v2->v3:
>> The v2 version had a typo which broke clean application to for-next.
>> Additionally in v3 the order of the patches was changed to make
>> it a little cleaner.
>>
>> Bob Pearson (5):
>>    RDMA/rxe: Add memory barriers to kernel queues
>>    RDMA/rxe: Cleanup MR status and type enums
>>    RDMA/rxe: Separate HW and SW l/rkeys
>>    RDMA/rxe: Create duplicate mapping tables for FMRs
>>    RDMA/rxe: Only allow invalidate for appropriate MRs
> applied to for-next, this is a little too complicated for for-rc, and
> no fixes lines
>
> Thanks,
> Jason

Jason can you pull in my patch also.

Thanks,

Shoaib
Yi Zhang Sept. 25, 2021, 7:55 a.m. UTC | #9
Hi Bob/Sagi

Regarding the blktests nvme-rdma failure with rdma_rxe, I found the
patch[3] reduce the number of MRs and MWs to 4K from 256K, which lead
the nvme connect failed[1], if I change the "number of io queues to
use" to 31, it will works[2],
and the default io queues num is the CPU core count(mine is 48),
that's why it failed on my environment.

I also have tried revert patch[2], and the blktests run successfully
on my server, could we change back to 256K again?

[1]
# nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 32
Failed to write to /dev/nvme-fabrics: Cannot allocate memory
# dmesg
[  857.546927] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  857.547640] nvme nvme0: creating 32 I/O queues.
[  857.595285] nvme nvme0: failed to initialize MR pool sized 128 for QID 32
[  857.602141] nvme nvme0: rdma connection establishment failed (-12)
[2]
# nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 31
# dmesg
[  863.792874] nvmet: creating controller 1 for subsystem testnqn for
NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  863.793638] nvme nvme0: creating 31 I/O queues.
[  863.845594] nvme nvme0: mapped 31/0/0 default/read/poll queues.
[  863.853633] nvme nvme0: new ctrl: NQN "testnqn", addr 10.16.221.74:4420

[3]
commit af732adfacb2c6d886713624af2ff8e555c32aa4
Author: Bob Pearson <rpearsonhpe@gmail.com>
Date:   Mon Jun 7 23:25:46 2021 -0500

    RDMA/rxe: Enable MW object pool

    Currently the rxe driver has a rxe_mw struct object but nothing about
    memory windows is enabled. This patch turns on memory windows and some
    minor cleanup.

    Set device attribute in rxe.c so max_mw = MAX_MW.  Change parameters in
    rxe_param.h so that MAX_MW is the same as MAX_MR.  Reduce the number of
    MRs and MWs to 4K from 256K.  Add device capability bits for 2a and 2b
    memory windows.  Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.



On Sat, Sep 18, 2021 at 10:13 AM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi Bob
> With this patch serious, the blktests nvme-rdma still can be failed
> with the below error. and the test can be pass with siw.
>
> [ 1702.140090] loop0: detected capacity change from 0 to 2097152
> [ 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1
> [ 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420)
> [ 1702.158810] nvmet: creating controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
> [ 1702.159037] nvme nvme0: creating 32 I/O queues.
> [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for QID 32
> [ 1702.178482] nvme nvme0: rdma connection establishment failed (-12)
> [ 1702.292261] eno2 speed is unknown, defaulting to 1000
> [ 1702.297325] eno3 speed is unknown, defaulting to 1000
> [ 1702.302389] eno4 speed is unknown, defaulting to 1000
> [ 1702.317991] rdma_rxe: unloaded
>
> Failure from:
>         /*
>          * Currently we don't use SG_GAPS MR's so if the first entry is
>          * misaligned we'll end up using two entries for a single data page,
>          * so one additional entry is required.
>          */
>         pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
>         ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
>                               queue->queue_size,
>                               IB_MR_TYPE_MEM_REG,
>                               pages_per_mr, 0);
>         if (ret) {
>                 dev_err(queue->ctrl->ctrl.device,
>                         "failed to initialize MR pool sized %d for QID %d\n",
>                         queue->queue_size, nvme_rdma_queue_idx(queue));
>                 goto out_destroy_ring;
>         }
>
>
> On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This series of patches implements several bug fixes and minor
> > cleanups of the rxe driver. Specifically these fix a bug exposed
> > by blktest.
> >
> > They apply cleanly to both
> > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc)
> > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
> >
> > The first patch is a rewrite of an earlier patch.
> > It adds memory barriers to kernel to kernel queues. The logic for this
> > is the same as an earlier patch that only treated user to kernel queues.
> > Without this patch kernel to kernel queues are expected to intermittently
> > fail at low frequency as was seen for the other queues.
> >
> > The second patch cleans up the state and type enums used by MRs.
> >
> > The third patch separates the keys in rxe_mr and ib_mr. This allows
> > the following sequence seen in the srp driver to work correctly.
> >
> >         do {
> >                 ib_post_send( IB_WR_LOCAL_INV )
> >                 ib_update_fast_reg_key()
> >                 ib_map_mr_sg()
> >                 ib_post_send( IB_WR_REG_MR )
> >         } while ( !done )
> >
> > The fourth patch creates duplicate mapping tables for fast MRs. This
> > prevents rkeys referencing fast MRs from accessing data from an updated
> > map after the call to ib_map_mr_sg() call by keeping the new and old
> > mappings separate and atomically swapping them when a reg mr WR is
> > executed.
> >
> > The fifth patch checks the type of MRs which receive local or remote
> > invalidate operations to prevent invalidating user MRs.
> >
> > v3->v4:
> > Two of the patches in v3 were accepted in v5.15 so have been dropped
> > here.
> >
> > The first patch was rewritten to correctly deal with queue operations
> > in rxe_verbs.c where the code is the client and not the server.
> >
> > v2->v3:
> > The v2 version had a typo which broke clean application to for-next.
> > Additionally in v3 the order of the patches was changed to make
> > it a little cleaner.
> >
> > Bob Pearson (5):
> >   RDMA/rxe: Add memory barriers to kernel queues
> >   RDMA/rxe: Cleanup MR status and type enums
> >   RDMA/rxe: Separate HW and SW l/rkeys
> >   RDMA/rxe: Create duplicate mapping tables for FMRs
> >   RDMA/rxe: Only allow invalidate for appropriate MRs
> >
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
> >  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
> >  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
> >  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-
> >  drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
> >  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------
> >  drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
> >  13 files changed, 438 insertions(+), 471 deletions(-)
> >
> > --
> > 2.30.2
> >
>
>
> --
> Best Regards,
>   Yi Zhang
Pearson, Robert B Sept. 25, 2021, 8:32 p.m. UTC | #10
Sorry that was my fault. 256K seemed way too big but 4K is too small. This whole area has been problematic lately. There are several use cases for rxe and one is to be an easy test target which is hard if you can't reach the limits.
On the other hand if you want to really use it you just want the limits to get out of the way. One way to fix this would be to add a bunch of module parameters but there are too many attributes to be practical.
Another would be to have 2 or 3 attribute sets (large, medium, small) that you could pick with one module parameter. Or, you could have a configuration file somewhere that the driver could read but seems messy.
Or you could pass strings to one module parameter that could be parsed to set the attributes. If anyone has a preference I'd be happy to write something.

Bob

-----Original Message-----
From: Yi Zhang <yi.zhang@redhat.com> 
Sent: Saturday, September 25, 2021 2:56 AM
To: Bob Pearson <rpearsonhpe@gmail.com>; Sagi Grimberg <sagi@grimberg.me>
Cc: Jason Gunthorpe <jgg@nvidia.com>; Zhu Yanjun <zyjzyj2000@gmail.com>; RDMA mailing list <linux-rdma@vger.kernel.org>; Bart Van Assche <bvanassche@acm.org>; mie@igel.co.jp; rao.shoaib@oracle.com
Subject: Re: [PATCH for-rc v4 0/5] RDMA/rxe: Various bug fixes.

Hi Bob/Sagi

Regarding the blktests nvme-rdma failure with rdma_rxe, I found the patch[3] reduce the number of MRs and MWs to 4K from 256K, which lead the nvme connect failed[1], if I change the "number of io queues to use" to 31, it will works[2], and the default io queues num is the CPU core count(mine is 48), that's why it failed on my environment.

I also have tried revert patch[2], and the blktests run successfully on my server, could we change back to 256K again?

[1]
# nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 32 Failed to write to /dev/nvme-fabrics: Cannot allocate memory # dmesg [  857.546927] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  857.547640] nvme nvme0: creating 32 I/O queues.
[  857.595285] nvme nvme0: failed to initialize MR pool sized 128 for QID 32 [  857.602141] nvme nvme0: rdma connection establishment failed (-12) [2] # nvme connect -t rdma -a 10.16.221.74 -s 4420 -n testnqn  -i 31 # dmesg [  863.792874] nvmet: creating controller 1 for subsystem testnqn for NQN nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0030-4310-8058-b9c04f325732.
[  863.793638] nvme nvme0: creating 31 I/O queues.
[  863.845594] nvme nvme0: mapped 31/0/0 default/read/poll queues.
[  863.853633] nvme nvme0: new ctrl: NQN "testnqn", addr 10.16.221.74:4420

[3]
commit af732adfacb2c6d886713624af2ff8e555c32aa4
Author: Bob Pearson <rpearsonhpe@gmail.com>
Date:   Mon Jun 7 23:25:46 2021 -0500

    RDMA/rxe: Enable MW object pool

    Currently the rxe driver has a rxe_mw struct object but nothing about
    memory windows is enabled. This patch turns on memory windows and some
    minor cleanup.

    Set device attribute in rxe.c so max_mw = MAX_MW.  Change parameters in
    rxe_param.h so that MAX_MW is the same as MAX_MR.  Reduce the number of
    MRs and MWs to 4K from 256K.  Add device capability bits for 2a and 2b
    memory windows.  Removed RXE_MR_TYPE_MW from the rxe_mr_type enum.



On Sat, Sep 18, 2021 at 10:13 AM Yi Zhang <yi.zhang@redhat.com> wrote:
>
> Hi Bob
> With this patch serious, the blktests nvme-rdma still can be failed 
> with the below error. and the test can be pass with siw.
>
> [ 1702.140090] loop0: detected capacity change from 0 to 2097152 [ 
> 1702.150729] nvmet: adding nsid 1 to subsystem blktests-subsystem-1 [ 
> 1702.151425] nvmet_rdma: enabling port 0 (10.16.221.116:4420) [ 
> 1702.158810] nvmet: creating controller 1 for subsystem
> blktests-subsystem-1 for NQN
> nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0035-4b10-8044-b9c04f463333.
> [ 1702.159037] nvme nvme0: creating 32 I/O queues.
> [ 1702.171671] nvme nvme0: failed to initialize MR pool sized 128 for 
> QID 32 [ 1702.178482] nvme nvme0: rdma connection establishment failed 
> (-12) [ 1702.292261] eno2 speed is unknown, defaulting to 1000 [ 
> 1702.297325] eno3 speed is unknown, defaulting to 1000 [ 1702.302389] 
> eno4 speed is unknown, defaulting to 1000 [ 1702.317991] rdma_rxe: 
> unloaded
>
> Failure from:
>         /*
>          * Currently we don't use SG_GAPS MR's so if the first entry is
>          * misaligned we'll end up using two entries for a single data page,
>          * so one additional entry is required.
>          */
>         pages_per_mr = nvme_rdma_get_max_fr_pages(ibdev, queue->pi_support) + 1;
>         ret = ib_mr_pool_init(queue->qp, &queue->qp->rdma_mrs,
>                               queue->queue_size,
>                               IB_MR_TYPE_MEM_REG,
>                               pages_per_mr, 0);
>         if (ret) {
>                 dev_err(queue->ctrl->ctrl.device,
>                         "failed to initialize MR pool sized %d for QID %d\n",
>                         queue->queue_size, nvme_rdma_queue_idx(queue));
>                 goto out_destroy_ring;
>         }
>
>
> On Wed, Sep 15, 2021 at 12:43 AM Bob Pearson <rpearsonhpe@gmail.com> wrote:
> >
> > This series of patches implements several bug fixes and minor 
> > cleanups of the rxe driver. Specifically these fix a bug exposed by 
> > blktest.
> >
> > They apply cleanly to both
> > commit 1b789bd4dbd48a92f5427d9c37a72a8f6ca17754 (origin/for-rc) 
> > commit 6a217437f9f5482a3f6f2dc5fcd27cf0f62409ac (origin/for-next)
> >
> > The first patch is a rewrite of an earlier patch.
> > It adds memory barriers to kernel to kernel queues. The logic for 
> > this is the same as an earlier patch that only treated user to kernel queues.
> > Without this patch kernel to kernel queues are expected to 
> > intermittently fail at low frequency as was seen for the other queues.
> >
> > The second patch cleans up the state and type enums used by MRs.
> >
> > The third patch separates the keys in rxe_mr and ib_mr. This allows 
> > the following sequence seen in the srp driver to work correctly.
> >
> >         do {
> >                 ib_post_send( IB_WR_LOCAL_INV )
> >                 ib_update_fast_reg_key()
> >                 ib_map_mr_sg()
> >                 ib_post_send( IB_WR_REG_MR )
> >         } while ( !done )
> >
> > The fourth patch creates duplicate mapping tables for fast MRs. This 
> > prevents rkeys referencing fast MRs from accessing data from an 
> > updated map after the call to ib_map_mr_sg() call by keeping the new 
> > and old mappings separate and atomically swapping them when a reg mr 
> > WR is executed.
> >
> > The fifth patch checks the type of MRs which receive local or remote 
> > invalidate operations to prevent invalidating user MRs.
> >
> > v3->v4:
> > Two of the patches in v3 were accepted in v5.15 so have been dropped 
> > here.
> >
> > The first patch was rewritten to correctly deal with queue 
> > operations in rxe_verbs.c where the code is the client and not the server.
> >
> > v2->v3:
> > The v2 version had a typo which broke clean application to for-next.
> > Additionally in v3 the order of the patches was changed to make it a 
> > little cleaner.
> >
> > Bob Pearson (5):
> >   RDMA/rxe: Add memory barriers to kernel queues
> >   RDMA/rxe: Cleanup MR status and type enums
> >   RDMA/rxe: Separate HW and SW l/rkeys
> >   RDMA/rxe: Create duplicate mapping tables for FMRs
> >   RDMA/rxe: Only allow invalidate for appropriate MRs
> >
> >  drivers/infiniband/sw/rxe/rxe_comp.c  |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_cq.c    |  25 +--
> >  drivers/infiniband/sw/rxe/rxe_loc.h   |   2 +
> >  drivers/infiniband/sw/rxe/rxe_mr.c    | 267 ++++++++++++++++-------
> >  drivers/infiniband/sw/rxe/rxe_mw.c    |  36 ++--
> >  drivers/infiniband/sw/rxe/rxe_qp.c    |  12 +-
> >  drivers/infiniband/sw/rxe/rxe_queue.c |  30 ++-  
> > drivers/infiniband/sw/rxe/rxe_queue.h | 292 +++++++++++---------------
> >  drivers/infiniband/sw/rxe/rxe_req.c   |  51 ++---
> >  drivers/infiniband/sw/rxe/rxe_resp.c  |  40 +---
> >  drivers/infiniband/sw/rxe/rxe_srq.c   |   2 +-
> >  drivers/infiniband/sw/rxe/rxe_verbs.c |  92 ++------  
> > drivers/infiniband/sw/rxe/rxe_verbs.h |  48 ++---
> >  13 files changed, 438 insertions(+), 471 deletions(-)
> >
> > --
> > 2.30.2
> >
>
>
> --
> Best Regards,
>   Yi Zhang



--
Best Regards,
  Yi Zhang