Message ID | 1539543498-29105-4-git-send-email-jsimmons@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | lustre: more assorted fixes for lustre 2.10 | expand |
On Sun, Oct 14 2018, James Simmons wrote: > From: Lai Siyao <lai.siyao@whamcloud.com> > > ptlrpc_client_wake_req() misses a memory barrier, which may cause > strange errors. > > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> > WC-bug-id: https://jira.whamcloud.com/browse/LU-8935 > Reviewed-on: https://review.whamcloud.com/26583 > Reviewed-by: Andreas Dilger <adilger@whamcloud.com> > Reviewed-by: Wang Shilong <wshilong@ddn.com> > Reviewed-by: Oleg Drokin <green@whamcloud.com> > Signed-off-by: James Simmons <jsimmons@infradead.org> > --- > drivers/staging/lustre/lustre/include/lustre_net.h | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h > index ce7e98c..468a03e 100644 > --- a/drivers/staging/lustre/lustre/include/lustre_net.h > +++ b/drivers/staging/lustre/lustre/include/lustre_net.h > @@ -2211,6 +2211,8 @@ static inline int ptlrpc_status_ntoh(int n) > static inline void > ptlrpc_client_wake_req(struct ptlrpc_request *req) > { > + /* ensure ptlrpc_register_bulk see rq_resend as set. */ > + smp_mb(); > if (!req->rq_set) > wake_up(&req->rq_reply_waitq); > else It is good that this memory barrier has a comment, but the comment isn't very helpful. There is no matching memory barrier in ptlrpc_register_bulk(), so it isn't clear what sequencing is important. And ptl_send_rpc() tests ->rq_resend *before* ptlrpc_register_bulk() is called (which also tests it). Presumably these should see that same value? So why does the comment refer to ptlrpc_register_bulk() instead of ptl_send_rpc() ?? It all seems rather confusing, so it is very hard to be sure that the code is now correct. Is someone able to explain? Thanks, NeilBrown > -- > 1.8.3.1
> On Sun, Oct 14 2018, James Simmons wrote: > > > From: Lai Siyao <lai.siyao@whamcloud.com> > > > > ptlrpc_client_wake_req() misses a memory barrier, which may cause > > strange errors. > > > > Signed-off-by: Lai Siyao <lai.siyao@whamcloud.com> > > WC-bug-id: https://jira.whamcloud.com/browse/LU-8935 > > Reviewed-on: https://review.whamcloud.com/26583 > > Reviewed-by: Andreas Dilger <adilger@whamcloud.com> > > Reviewed-by: Wang Shilong <wshilong@ddn.com> > > Reviewed-by: Oleg Drokin <green@whamcloud.com> > > Signed-off-by: James Simmons <jsimmons@infradead.org> > > --- > > drivers/staging/lustre/lustre/include/lustre_net.h | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h > > index ce7e98c..468a03e 100644 > > --- a/drivers/staging/lustre/lustre/include/lustre_net.h > > +++ b/drivers/staging/lustre/lustre/include/lustre_net.h > > @@ -2211,6 +2211,8 @@ static inline int ptlrpc_status_ntoh(int n) > > static inline void > > ptlrpc_client_wake_req(struct ptlrpc_request *req) > > { > > + /* ensure ptlrpc_register_bulk see rq_resend as set. */ > > + smp_mb(); > > if (!req->rq_set) > > wake_up(&req->rq_reply_waitq); > > else > > It is good that this memory barrier has a comment, but the comment isn't > very helpful. > There is no matching memory barrier in ptlrpc_register_bulk(), so it > isn't clear what sequencing is important. > > And ptl_send_rpc() tests ->rq_resend *before* ptlrpc_register_bulk() is > called (which also tests it). Presumably these should see that same > value? So why does the comment refer to ptlrpc_register_bulk() instead > of ptl_send_rpc() ?? > > It all seems rather confusing, so it is very hard to be sure that the > code is now correct. > Is someone able to explain? I wasn't going on much here. While the linux kernel request comments to place with memory barriers lustre developers tend to never leave an explaination on why a memory barrier was needed. In this case I examined the original JIRA ticket to find in one of the comments: "It looks like ptlrpc_client_wake_req() misses a memory barrier, which may cause ptlrpc_resend_req() wake up ptlrpc_send_rpc -> ptlrpc_register_bulk, while the latter doesn't see rq_resend set." I attempted to add that as a comment. This is all I had to go one. Now Lai is CC to this email so maybe he remembers what it was all about. > Thanks, > NeilBrown > > > > > -- > > 1.8.3.1 >
diff --git a/drivers/staging/lustre/lustre/include/lustre_net.h b/drivers/staging/lustre/lustre/include/lustre_net.h index ce7e98c..468a03e 100644 --- a/drivers/staging/lustre/lustre/include/lustre_net.h +++ b/drivers/staging/lustre/lustre/include/lustre_net.h @@ -2211,6 +2211,8 @@ static inline int ptlrpc_status_ntoh(int n) static inline void ptlrpc_client_wake_req(struct ptlrpc_request *req) { + /* ensure ptlrpc_register_bulk see rq_resend as set. */ + smp_mb(); if (!req->rq_set) wake_up(&req->rq_reply_waitq); else