Message ID | 20200621145934.4069.31886.stgit@manet.1015granger.net (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | xprtrdma: Ensure connect worker is awoken after connect error | expand |
Anna, please drop this one. It appears to trigger a particularly nasty use-after-free. I'll follow up with a more complete fix soon. (Yes, a wake-up on connect errors is indeed necessary... but the connect worker needs to be re-organized to deal properly with it). > On Jun 21, 2020, at 10:59 AM, Chuck Lever <chuck.lever@oracle.com> wrote: > > From: Dan Aloni <dan@kernelim.com> > > The connect worker sleeps waiting for either successful connection > establishment or an error. Commit e28ce90083f0 ("xprtrdma: kmalloc > rpcrdma_ep separate from rpcrdma_xprt") mistakenly removed the > wake-up in cases when connection establishment fails. > > Fixes: e28ce90083f0 ("xprtrdma: kmalloc rpcrdma_ep separate from rpcrdma_xprt") > Signed-off-by: Dan Aloni <dan@kernelim.com> > [ cel: rebased on recent fixes to 5.8-rc; patch description rewritten ] > Signed-off-by: Chuck Lever <chuck.lever@oracle.com> > --- > net/sunrpc/xprtrdma/verbs.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c > index 2198c8ec8dff..54c6809bf06e 100644 > --- a/net/sunrpc/xprtrdma/verbs.c > +++ b/net/sunrpc/xprtrdma/verbs.c > @@ -296,6 +296,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) > ep->re_connect_status = -ECONNABORTED; > disconnected: > rpcrdma_force_disconnect(ep); > + wake_up_all(&ep->re_connect_wait); > return rpcrdma_ep_put(ep); > default: > break; > -- Chuck Lever
On Thu, Jun 25, 2020 at 03:19:39PM -0400, Chuck Lever wrote: > Anna, please drop this one. It appears to trigger a particularly nasty > use-after-free. I'll follow up with a more complete fix soon. > > (Yes, a wake-up on connect errors is indeed necessary... but the connect > worker needs to be re-organized to deal properly with it). After sending that patch I also noticed more issues with the management of the EP context. The decref inside the CM handler caused freeing of the EP while connect path still held a reference to it. A KASAN-enabled kernel revealed this easily. I've sent just now a more comprehensive patch to deal with this.
diff --git a/net/sunrpc/xprtrdma/verbs.c b/net/sunrpc/xprtrdma/verbs.c index 2198c8ec8dff..54c6809bf06e 100644 --- a/net/sunrpc/xprtrdma/verbs.c +++ b/net/sunrpc/xprtrdma/verbs.c @@ -296,6 +296,7 @@ rpcrdma_cm_event_handler(struct rdma_cm_id *id, struct rdma_cm_event *event) ep->re_connect_status = -ECONNABORTED; disconnected: rpcrdma_force_disconnect(ep); + wake_up_all(&ep->re_connect_wait); return rpcrdma_ep_put(ep); default: break;