Message ID | 20240604194522.10390-6-cel@kernel.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/5] xprtrdma: Fix rpcrdma_reqs_reset() | expand |
On 04/06/2024 22:45, cel@kernel.org wrote: > From: Chuck Lever <chuck.lever@oracle.com> > > Avoid FastReg operations getting MW_BIND_ERR after a reconnect. > > rpcrdma_reqs_reset() is called on transport tear-down to get each > rpcrdma_req back into a clean state. > > MRs on req->rl_registered are waiting for a FastReg, are already > registered, or are waiting for invalidation. If the transport is > being torn down when reqs_reset() is called, the matching LocalInv > might never be posted. That leaves these MR registered /and/ on > req->rl_free_mrs, where they can be re-used for the next > connection. > > Since xprtrdma does not keep specific track of the MR state, it's > not possible to know what state these MRs are in, so the only safe > thing to do is release them immediately. > > Fixes: 5de55ce951a1 ("xprtrdma: Release in-flight MRs on disconnect") > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/frwr_ops.c | 3 ++- > net/sunrpc/xprtrdma/verbs.c | 16 +++++++++++++++- > 2 files changed, 17 insertions(+), 2 deletions(-) > > diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c > index ffbf99894970..47f33bb7bff8 100644 > --- a/net/sunrpc/xprtrdma/frwr_ops.c > +++ b/net/sunrpc/xprtrdma/frwr_ops.c > @@ -92,7 +92,8 @@ static void frwr_mr_put(struct rpcrdma_mr *mr) > rpcrdma_mr_push(mr, &mr->mr_req->rl_free_mrs); > } > > -/* frwr_reset - Place MRs back on the free list > +/** > + * frwr_reset - Place MRs back on @req's free list > * @req: request to reset > * > * Used after a failed marshal. For FRWR, this means the MRs > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 432557a553e7..a0b071089e15 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -897,6 +897,8 @@ static int rpcrdma_reqs_setup(struct rpcrdma_xprt *r_xprt) > > static void rpcrdma_req_reset(struct rpcrdma_req *req) > { > + struct rpcrdma_mr *mr; > + > /* Credits are valid for only one connection */ > req->rl_slot.rq_cong = 0; > > @@ -906,7 +908,19 @@ static void rpcrdma_req_reset(struct rpcrdma_req *req) > rpcrdma_regbuf_dma_unmap(req->rl_sendbuf); > rpcrdma_regbuf_dma_unmap(req->rl_recvbuf); > > - frwr_reset(req); > + /* The verbs consumer can't know the state of an MR on the > + * req->rl_registered list unless a successful completion > + * has occurred, so they cannot be re-used. > + */ Theoretically it would be possible to know by inspecting the FLUSH error completions and see which of those was a registration wr (meaning the registration was not ever executed). But I guess its simpler to just assume that it isn't reusable. > + while ((mr = rpcrdma_mr_pop(&req->rl_registered))) { > + struct rpcrdma_buffer *buf = &mr->mr_xprt->rx_buf; > + > + spin_lock(&buf->rb_lock); > + list_del(&mr->mr_all); > + spin_unlock(&buf->rb_lock); > + > + frwr_mr_release(mr); > + } > } > > /* ASSUMPTION: the rb_allreqs list is stable for the duration,
diff --git a/net/sunrpc/xprtrdma/frwr_ops.c b/net/sunrpc/xprtrdma/frwr_ops.c index ffbf99894970..47f33bb7bff8 100644 --- a/net/sunrpc/xprtrdma/frwr_ops.c +++ b/net/sunrpc/xprtrdma/frwr_ops.c @@ -92,7 +92,8 @@ static void frwr_mr_put(struct rpcrdma_mr *mr) rpcrdma_mr_push(mr, &mr->mr_req->rl_free_mrs); } -/* frwr_reset - Place MRs back on the free list +/** + * frwr_reset - Place MRs back on @req's free list * @req: request to reset * * Used after a failed marshal. For FRWR, this means the MRs diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 432557a553e7..a0b071089e15 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -897,6 +897,8 @@ static int rpcrdma_reqs_setup(struct rpcrdma_xprt *r_xprt) static void rpcrdma_req_reset(struct rpcrdma_req *req) { + struct rpcrdma_mr *mr; + /* Credits are valid for only one connection */ req->rl_slot.rq_cong = 0; @@ -906,7 +908,19 @@ static void rpcrdma_req_reset(struct rpcrdma_req *req) rpcrdma_regbuf_dma_unmap(req->rl_sendbuf); rpcrdma_regbuf_dma_unmap(req->rl_recvbuf); - frwr_reset(req); + /* The verbs consumer can't know the state of an MR on the + * req->rl_registered list unless a successful completion + * has occurred, so they cannot be re-used. + */ + while ((mr = rpcrdma_mr_pop(&req->rl_registered))) { + struct rpcrdma_buffer *buf = &mr->mr_xprt->rx_buf; + + spin_lock(&buf->rb_lock); + list_del(&mr->mr_all); + spin_unlock(&buf->rb_lock); + + frwr_mr_release(mr); + } } /* ASSUMPTION: the rb_allreqs list is stable for the duration,