[3.2.5] NFSv3 CLOSE_WAIT hang

Message ID	6cb9.5049fd40.b47c1@altium.nl (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-nfs-owner@vger.kernel.org> To: linux-nfs@vger.kernel.org Mime-Version: 1.0 Reply-To: dick.streefland@altium.nl (Dick Streefland) References: <20110909194509.GB6195@hostway.ca> <1315610322.17611.112.camel@lade.trondhjem.org> <20111020190334.GA22772@hostway.ca> <20120301225524.GB27595@hostway.ca> <20120302002511.GA4495@hostway.ca> <20120302184918.GA20702@hostway.ca> <loom.20120905T093214-245@post.gmane.org> <loom.20120905T093214-245@post.gmane.org> <4FA345DA4F4AE44899BD2B03EEEC2FA908F86381@SACEXCMBX04-PRD.hq.netapp.com> From: dick.streefland@altium.nl (Dick Streefland) Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang Content-Type: text/plain; charset=us-ascii NNTP-Posting-Host: 172.17.1.66 Message-ID: <6cb9.5049fd40.b47c1@altium.nl> Date: Fri, 07 Sep 2012 13:57:20 -0000 From: rnews@altium.nl Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk

Message ID

6cb9.5049fd40.b47c1@altium.nl (mailing list archive)

State

New, archived

Headers

To: linux-nfs@vger.kernel.org
Mime-Version: 1.0
Reply-To: dick.streefland@altium.nl (Dick Streefland)
References: <20110909194509.GB6195@hostway.ca>
	<1315610322.17611.112.camel@lade.trondhjem.org>
	<20111020190334.GA22772@hostway.ca>
	<20120301225524.GB27595@hostway.ca>
	<20120302002511.GA4495@hostway.ca>
	<20120302184918.GA20702@hostway.ca>
	<loom.20120905T093214-245@post.gmane.org>
	<loom.20120905T093214-245@post.gmane.org>
	<4FA345DA4F4AE44899BD2B03EEEC2FA908F86381@SACEXCMBX04-PRD.hq.netapp.com>
From: dick.streefland@altium.nl (Dick Streefland)
Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
Content-Type: text/plain; charset=us-ascii
NNTP-Posting-Host: 172.17.1.66
Message-ID: <6cb9.5049fd40.b47c1@altium.nl>
Date: Fri, 07 Sep 2012 13:57:20 -0000
From: rnews@altium.nl
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk

Commit Message

Dick Streefland Sept. 7, 2012, 1:57 p.m. UTC

"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
| Clearly the comment is misleading and should be removed. That write lock
| 
| _is_ needed in order to throttle slots on TCP.
| 
| As far as I know, kernel 3.3 is not in stable support any more, so I
| can't help that. Can you reproduce the problem on a 3.5 kernel or
| higher?

I can easily reproduce the hang on the latest kernel (eeea3ac912)
with my script (http://permalink.gmane.org/gmane.linux.nfs/51833).
With this change, it doesn't hang:

Comments

Trond Myklebust Sept. 7, 2012, 2:13 p.m. UTC | #1

> -----Original Message-----
> From: linux-nfs-owner@vger.kernel.org [mailto:linux-nfs-
> owner@vger.kernel.org] On Behalf Of Dick Streefland
> Sent: Friday, September 07, 2012 9:57 AM
> To: linux-nfs@vger.kernel.org
> Subject: Re: [3.2.5] NFSv3 CLOSE_WAIT hang
> 
> "Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
> | Clearly the comment is misleading and should be removed. That write
> | lock
> |
> | _is_ needed in order to throttle slots on TCP.
> |
> | As far as I know, kernel 3.3 is not in stable support any more, so I
> | can't help that. Can you reproduce the problem on a 3.5 kernel or
> | higher?
> 
> I can easily reproduce the hang on the latest kernel (eeea3ac912) with my
> script (http://permalink.gmane.org/gmane.linux.nfs/51833).

On TCP?

> With this change, it doesn't hang:

It doesn't matter as long as that change is wrong. That code _is_ needed for TCP.

Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose.

Trond
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Dick Streefland Sept. 7, 2012, 2:33 p.m. UTC | #2

"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote:
| > I can easily reproduce the hang on the latest kernel (eeea3ac912) with my
| > script (http://permalink.gmane.org/gmane.linux.nfs/51833).
| 
| On TCP?

No, on UDP (mount options are in that first email).

| > With this change, it doesn't hang:
| 
| It doesn't matter as long as that change is wrong. That code _is_ needed for TCP.
| 
| Now if UDP is broken, then we can talk about a fix for that issue, but that's not going to be the one you propose.

OK, fine with me. My proposal to revert this change was only based on
the remark in the comment that it is not strictly needed. So we need
to differentiate between UDP and TCP then?

diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c
index a5a402a..6e739bf 100644
--- a/net/sunrpc/xprt.c
+++ b/net/sunrpc/xprt.c
@@ -1083,20 +1083,9 @@  void xprt_reserve(struct rpc_task *task)
 	if (task->tk_rqstp != NULL)
 		return;
 
-	/* Note: grabbing the xprt_lock_write() here is not strictly needed,
-	 * but ensures that we throttle new slot allocation if the transport
-	 * is congested (e.g. if reconnecting or if we're out of socket
-	 * write buffer space).
-	 */
-	task->tk_timeout = 0;
-	task->tk_status = -EAGAIN;
-	if (!xprt_lock_write(xprt, task))
-		return;
-
 	spin_lock(&xprt->reserve_lock);
 	xprt_alloc_slot(task);
 	spin_unlock(&xprt->reserve_lock);
-	xprt_release_write(xprt, task);
 }
 
 static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt)

[3.2.5] NFSv3 CLOSE_WAIT hang

Commit Message

Comments

Patch