diff mbox

[3.2.5] NFSv3 CLOSE_WAIT hang

Message ID 6cb9.5049fd40.b47c1@altium.nl (mailing list archive)
State New, archived
Headers show

Commit Message

Dick Streefland Sept. 7, 2012, 1:57 p.m. UTC
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
| Clearly the comment is misleading and should be removed. That write lock
| 
| _is_ needed in order to throttle slots on TCP.
| 
| As far as I know, kernel 3.3 is not in stable support any more, so I
| can't help that. Can you reproduce the problem on a 3.5 kernel or
| higher?

I can easily reproduce the hang on the latest kernel (eeea3ac912)
with my script (http://permalink.gmane.org/gmane.linux.nfs/51833).
With this change, it doesn't hang:

Comments

Trond Myklebust Sept. 7, 2012, 2:13 p.m. UTC | #1
> -----Original Message-----
> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-
> owner@vger.kernel.org] On Behalf Of Dick Streefland
> Sent: Friday, September 07, 2012 9:57 AM
> To: linux-nfs@vger.kernel.org
> Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
> 
> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> | Clearly the comment is misleading and should be removed. That write
> | lock
> |
> | _is_ needed in order to throttle slots on TCP.
> |
> | As far as I know, kernel 3.3 is not in stable support any more, so I
> | can't help that. Can you reproduce the problem on a 3.5 kernel or
> | higher?
> 
> I can easily reproduce the hang on the latest kernel (eeea3ac912) with my
> script (http://permalink.gmane.org/gmane.linux.nfs/51833).

On TCP?

> With this change, it doesn't hang:

It doesn't matter as long as that change is wrong. That code _is_ needed for TCP.

Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose.

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dick Streefland Sept. 7, 2012, 2:33 p.m. UTC | #2
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
| > I can easily reproduce the hang on the latest kernel (eeea3ac912) with my
| > script (http://permalink.gmane.org/gmane.linux.nfs/51833).
| 
| On TCP?

No, on UDP (mount options are in that first email).

| > With this change, it doesn't hang:
| 
| It doesn't matter as long as that change is wrong. That code _is_ needed for TCP.
| 
| Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose.

OK, fine with me. My proposal to revert this change was only based on
the remark in the comment that it is not strictly needed. So we need
to differentiate between UDP and TCP then?
diff mbox

Patch

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a5a402a..6e739bf 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1083,20 +1083,9 @@  void xprt_reserve(struct rpc_task *task)
 	if (task->tk_rqstp != NULL)
 		return;
 
-	/* Note: grabbing the xprt_lock_write() here is not strictly needed,
-	 * but ensures that we throttle new slot allocation if the transport
-	 * is congested (e.g. if reconnecting or if we're out of socket
-	 * write buffer space).
-	 */
-	task->tk_timeout = 0;
-	task->tk_status = -EAGAIN;
-	if (!xprt_lock_write(xprt, task))
-		return;
-
 	spin_lock(&xprt->reserve_lock);
 	xprt_alloc_slot(task);
 	spin_unlock(&xprt->reserve_lock);
-	xprt_release_write(xprt, task);
 }
 
 static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt)