Message ID | 1828884A29C6694DAF28B7E6B8A8237388CA6C27@ORSMSX109.amr.corp.intel.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
Hi Sean, I will re-check until the end of the week; there is some test scheduling issue with our test system, which affects my access times. Thanks Andreas On Mon, 19 Aug 2013 17:10:11 +0000 "Hefty, Sean" <sean.hefty@intel.com> wrote: > Can you see if the patch below fixes the hang? > > Signed-off-by: Sean Hefty <sean.hefty@intel.com> > --- > src/rsocket.c | 11 ++++++++++- > 1 files changed, 10 insertions(+), 1 deletions(-) > > diff --git a/src/rsocket.c b/src/rsocket.c > index d544dd0..e45b26d 100644 > --- a/src/rsocket.c > +++ b/src/rsocket.c > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd > *rfds, struct pollfd *fds, nfds_t nfds) > rs = idm_lookup(&idm, fds[i].fd); > if (rs) { > + fastlock_acquire(&rs->cq_wait_lock); > if (rs->type == SOCK_STREAM) > rs_get_cq_event(rs); > else > ds_get_cq_event(rs); > + fastlock_release(&rs->cq_wait_lock); > fds[i].revents = rs_poll_rs(rs, > fds[i].events, 1, rs_poll_all); } else { > fds[i].revents = rfds[i].revents; > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set > *writefds, > /* > * For graceful disconnect, notify the remote side that we're > - * disconnecting and wait until all outstanding sends complete. > + * disconnecting and wait until all outstanding sends complete, > provided > + * that the remote side has not sent a disconnect message. > */ > int rshutdown(int socket, int how) > { > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how) > if (rs->state & rs_connected) > rs_process_cq(rs, 0, rs_conn_all_sends_done); > > + if (rs->state & rs_disconnected) { > + /* Generate event by flushing receives to unblock > rpoll */ > + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); > + rdma_disconnect(rs->cm_id); > + } > + > if ((rs->fd_flags & O_NONBLOCK) && (rs->state & > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags); > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-rdma" > in the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > >
Hi, I have added the patch and re-tested: I still encounter hangs of my application. I am not quite sure whether the I hit the same error on the shutdown because now I don't hit the error always, but only every now and then. WHen adding the patch to my code base (git tag v1.0.17) I notice an offset of "-34 lines". Which code base are you using? Best Regards Andreas Bluemle On Tue, 20 Aug 2013 09:21:13 +0200 Andreas Bluemle <andreas.bluemle@itxperts.de> wrote: > Hi Sean, > > I will re-check until the end of the week; there is > some test scheduling issue with our test system, which > affects my access times. > > Thanks > > Andreas > > > On Mon, 19 Aug 2013 17:10:11 +0000 > "Hefty, Sean" <sean.hefty@intel.com> wrote: > > > Can you see if the patch below fixes the hang? > > > > Signed-off-by: Sean Hefty <sean.hefty@intel.com> > > --- > > src/rsocket.c | 11 ++++++++++- > > 1 files changed, 10 insertions(+), 1 deletions(-) > > > > diff --git a/src/rsocket.c b/src/rsocket.c > > index d544dd0..e45b26d 100644 > > --- a/src/rsocket.c > > +++ b/src/rsocket.c > > @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd > > *rfds, struct pollfd *fds, nfds_t nfds) > > rs = idm_lookup(&idm, fds[i].fd); > > if (rs) { > > + fastlock_acquire(&rs->cq_wait_lock); > > if (rs->type == SOCK_STREAM) > > rs_get_cq_event(rs); > > else > > ds_get_cq_event(rs); > > + fastlock_release(&rs->cq_wait_lock); > > fds[i].revents = rs_poll_rs(rs, > > fds[i].events, 1, rs_poll_all); } else { > > fds[i].revents = rfds[i].revents; > > @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set > > *writefds, > > /* > > * For graceful disconnect, notify the remote side that we're > > - * disconnecting and wait until all outstanding sends complete. > > + * disconnecting and wait until all outstanding sends complete, > > provided > > + * that the remote side has not sent a disconnect message. > > */ > > int rshutdown(int socket, int how) > > { > > @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how) > > if (rs->state & rs_connected) > > rs_process_cq(rs, 0, rs_conn_all_sends_done); > > > > + if (rs->state & rs_disconnected) { > > + /* Generate event by flushing receives to unblock > > rpoll */ > > + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); > > + rdma_disconnect(rs->cm_id); > > + } > > + > > if ((rs->fd_flags & O_NONBLOCK) && (rs->state & > > rs_connected)) rs_set_nonblocking(rs, rs->fd_flags); > > > > > > > > -- > > To unsubscribe from this list: send the line "unsubscribe > > linux-rdma" in the body of a message to majordomo@vger.kernel.org > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > > > > >
> I have added the patch and re-tested: I still encounter > hangs of my application. I am not quite sure whether the > I hit the same error on the shutdown because now I don't hit > the error always, but only every now and then. I guess this is at least some progress... :/ > WHen adding the patch to my code base (git tag v1.0.17) I notice > an offset of "-34 lines". Which code base are you using? This patch was generated against the tip of the git tree. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Sean, I tested out the patch and unfortunately had the same results as Andreas. About 50% of the time the rpoll() thread in Ceph still hangs when rshutdown() is called. I saw a similar behaviour when increasing the poll time on the pre-patched version if that's of any relevance. Thanks On Tue, Aug 20, 2013 at 11:04 PM, Hefty, Sean <sean.hefty@intel.com> wrote: >> I have added the patch and re-tested: I still encounter >> hangs of my application. I am not quite sure whether the >> I hit the same error on the shutdown because now I don't hit >> the error always, but only every now and then. > > I guess this is at least some progress... :/ > >> WHen adding the patch to my code base (git tag v1.0.17) I notice >> an offset of "-34 lines". Which code base are you using? > > This patch was generated against the tip of the git tree. > -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/src/rsocket.c b/src/rsocket.c index d544dd0..e45b26d 100644 --- a/src/rsocket.c +++ b/src/rsocket.c @@ -2948,10 +2948,12 @@ static int rs_poll_events(struct pollfd *rfds, struct pollfd *fds, nfds_t nfds) rs = idm_lookup(&idm, fds[i].fd); if (rs) { + fastlock_acquire(&rs->cq_wait_lock); if (rs->type == SOCK_STREAM) rs_get_cq_event(rs); else ds_get_cq_event(rs); + fastlock_release(&rs->cq_wait_lock); fds[i].revents = rs_poll_rs(rs, fds[i].events, 1, rs_poll_all); } else { fds[i].revents = rfds[i].revents; @@ -3098,7 +3100,8 @@ int rselect(int nfds, fd_set *readfds, fd_set *writefds, /* * For graceful disconnect, notify the remote side that we're - * disconnecting and wait until all outstanding sends complete. + * disconnecting and wait until all outstanding sends complete, provided + * that the remote side has not sent a disconnect message. */ int rshutdown(int socket, int how) { @@ -3138,6 +3141,12 @@ int rshutdown(int socket, int how) if (rs->state & rs_connected) rs_process_cq(rs, 0, rs_conn_all_sends_done); + if (rs->state & rs_disconnected) { + /* Generate event by flushing receives to unblock rpoll */ + ibv_req_notify_cq(rs->cm_id->recv_cq, 0); + rdma_disconnect(rs->cm_id); + } + if ((rs->fd_flags & O_NONBLOCK) && (rs->state & rs_connected)) rs_set_nonblocking(rs, rs->fd_flags);
Can you see if the patch below fixes the hang? Signed-off-by: Sean Hefty <sean.hefty@intel.com> --- src/rsocket.c | 11 ++++++++++- 1 files changed, 10 insertions(+), 1 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html