Message ID | 4FA345DA4F4AE44899BD2B03EEEC2FA908F8E833@SACEXCMBX04-PRD.hq.netapp.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
"Myklebust, Trond" <Trond.Myklebust@netapp.com> wrote: | Yes. Can you please see if the following patch fixes the UDP hang? | | 8<--------------------------------------------------------------------- | From f39c1bfb5a03e2d255451bff05be0d7255298fa4 Mon Sep 17 00:00:00 2001 | From: Trond Myklebust <Trond.Myklebust@netapp.com> | Date: Fri, 7 Sep 2012 11:08:50 -0400 | Subject: [PATCH] SUNRPC: Fix a UDP transport regression | | Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that | we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing | hangs in the case of NFS over UDP mounts. | | Since neither the UDP or the RDMA transport mechanism use dynamic slot | allocation, we can skip grabbing the socket lock for those transports. | Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA | case. | | Note that the NFSv4.1 back channel assigns the slot directly | through rpc_run_bc_task, so we can ignore that case. | | Reported-by: Dick Streefland <dick.streefland@altium.nl> | Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> | Cc: stable@vger.kernel.org [>= 3.1] This patch appears to fix the issue for me. I cannot reproduce the hang anymore.
Dick Streefland <dick.streefland@...> writes: > > "Myklebust, Trond" <Trond.Myklebust@...> wrote: > | Yes. Can you please see if the following patch fixes the UDP hang? > | > | 8<--------------------------------------------------------------------- > | From f39c1bfb5a03e2d255451bff05be0d7255298fa4 Mon Sep 17 00:00:00 2001 > | From: Trond Myklebust <Trond.Myklebust@...> > | Date: Fri, 7 Sep 2012 11:08:50 -0400 > | Subject: [PATCH] SUNRPC: Fix a UDP transport regression > | > | Commit 43cedbf0e8dfb9c5610eb7985d5f21263e313802 (SUNRPC: Ensure that > | we grab the XPRT_LOCK before calling xprt_alloc_slot) is causing > | hangs in the case of NFS over UDP mounts. > | > | Since neither the UDP or the RDMA transport mechanism use dynamic slot > | allocation, we can skip grabbing the socket lock for those transports. > | Add a new rpc_xprt_op to allow switching between the TCP and UDP/RDMA > | case. > | > | Note that the NFSv4.1 back channel assigns the slot directly > | through rpc_run_bc_task, so we can ignore that case. > | > | Reported-by: Dick Streefland <dick.streefland@...> > | Signed-off-by: Trond Myklebust <Trond.Myklebust@...> > | Cc: stable@... [>= 3.1] > > This patch appears to fix the issue for me. I cannot reproduce the > hang anymore. > Hi Trond, Apologies for my late response. Upgrading to kernel 3.5 requires some effort. I am still working on it. After applying your patch on 3.3 kernel, the problem is gone when using UDP mounts. But it remains hang in the case of NFS over TCP mounts. I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test suite repeatedly. About less than 200 times, it eventually ran into the CLOSE_WAIT hang. I got the following messages after enabling rpc_debug & nfs_debug: 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE a:call_reserveresult q:xprt_sending ... And the hung task information: INFO: task mmap3:24017 blocked for more than 120 seconds. "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. mmap3 D c0237070 0 24017 23980 0x00000000 [<c0237070>] (__schedule+0x608/0x6d8) from [<c02372d4>] (io_schedule+0x84/0xc0) [<c02372d4>] (io_schedule+0x84/0xc0) from [<c006f2f0>] (sleep_on_page+0x8/0x10) [<c006f2f0>] (sleep_on_page+0x8/0x10) from [<c02357dc>] (__wait_on_bit+0x54/0x9c) [<c02357dc>] (__wait_on_bit+0x54/0x9c) from [<c006f6a8>] (wait_on_page_bit+0xbc/0xd4) [<c006f6a8>] (wait_on_page_bit+0xbc/0xd4) from [<c00700c4>] (filemap_fdatawait_range+0x88/0x13c) [<c00700c4>] (filemap_fdatawait_range+0x88/0x13c) from [<c007029c>] (filemap_write_and_wait_range+0x50/0x64) [<c007029c>] (filemap_write_and_wait_range+0x50/0x64) from [<c00ff280>] (nfs_file_fsync+0x5c/0x154) [<c00ff280>] (nfs_file_fsync+0x5c/0x154) from [<c00c200c>] (vfs_fsync_range+0x30/0x40) [<c00c200c>] (vfs_fsync_range+0x30/0x40) from [<c00c203c>] (vfs_fsync+0x20/0x28) [<c00c203c>] (vfs_fsync+0x20/0x28) from [<c009b880>] (filp_close+0x40/0x84) [<c009b880>] (filp_close+0x40/0x84) from [<c001b9b0>] (put_files_struct+0xa8/0xfc) [<c001b9b0>] (put_files_struct+0xa8/0xfc) from [<c001d3f4>] (do_exit+0x278/0x78c) [<c001d3f4>] (do_exit+0x278/0x78c) from [<c001d9b0>] (do_group_exit+0xa8/0xd4) [<c001d9b0>] (do_group_exit+0xa8/0xd4) from [<c002a538>] (get_signal_to_deliver+0x48c/0x4f8) [<c002a538>] (get_signal_to_deliver+0x48c/0x4f8) from [<c000b7a0>] (do_signal+0x88/0x584) [<c000b7a0>] (do_signal+0x88/0x584) from [<c000bcb4>] (do_notify_resume+0x18/0x50) [<c000bcb4>] (do_notify_resume+0x18/0x50) from [<c0009418>] (work_pending+0x24/0x28) -- Regards, Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, Sep 10, 2012 at 09:00:37AM +0000, Yan-Pai Chen wrote: > Hi Trond, > > Apologies for my late response. > Upgrading to kernel 3.5 requires some effort. I am still working on it. > > After applying your patch on 3.3 kernel, the problem is gone when using UDP > mounts. > But it remains hang in the case of NFS over TCP mounts. > > I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test > suite repeatedly. > About less than 200 times, it eventually ran into the CLOSE_WAIT hang. > I got the following messages after enabling rpc_debug & nfs_debug: > > 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > a:call_reserveresult q:xprt_sending > ... Hello! This problem still bites us rarely, and we've been using TCP NFS for some time. However, our case seems to be narrowed it down to a very long storage hang on the knfsd side. If storage never has any problems, we don't see the NFS client hang. I was going to try to make a test-case by forcing the server to hang, but I never got around to this. Meanwhile, I've been running the clients with the debugging patches I posted earlier, and it always prints the 'xprt_force_disconnect(): setting XPRT_CLOSE_WAIT" warning before hanging. If Apache is in sendfile() at the time, it seems to get stuck forever; otherwise, it might recover. http://www.spinics.net/lists/linux-nfs/msg29495.html http://0x.ca/sim/ref/3.2.10/dmesg I suppose we could try 3.5 at this point. Simon- -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Tue, 2012-09-11 at 12:40 -0700, Simon Kirby wrote: > On Mon, Sep 10, 2012 at 09:00:37AM +0000, Yan-Pai Chen wrote: > > > Hi Trond, > > > > Apologies for my late response. > > Upgrading to kernel 3.5 requires some effort. I am still working on it. > > > > After applying your patch on 3.3 kernel, the problem is gone when using UDP > > mounts. > > But it remains hang in the case of NFS over TCP mounts. > > > > I reproduced the problem by executing mm/mtest06_3 (i.e. mmap3) in the LTP test > > suite repeatedly. > > About less than 200 times, it eventually ran into the CLOSE_WAIT hang. > > I got the following messages after enabling rpc_debug & nfs_debug: > > > > 47991 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47992 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47993 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47994 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > 47995 0001 -11 cf2910e0 (null) 0 c0243f40 nfsv3 WRITE > > a:call_reserveresult q:xprt_sending > > ... > > Hello! > > This problem still bites us rarely, and we've been using TCP NFS for some > time. However, our case seems to be narrowed it down to a very long > storage hang on the knfsd side. If storage never has any problems, we > don't see the NFS client hang. I was going to try to make a test-case by > forcing the server to hang, but I never got around to this. Meanwhile, > I've been running the clients with the debugging patches I posted > earlier, and it always prints the 'xprt_force_disconnect(): setting > XPRT_CLOSE_WAIT" warning before hanging. If Apache is in sendfile() at > the time, it seems to get stuck forever; otherwise, it might recover. Does the "if (test_and_set_bit(XPRT_LOCK) == 0)" condition immediately following that succeed so that queue_work() is called? > http://www.spinics.net/lists/linux-nfs/msg29495.html > http://0x.ca/sim/ref/3.2.10/dmesg > > I suppose we could try 3.5 at this point. If you've been keeping up with the 3.2 stable releases, then I wouldn't expect any major differences to the sunrpc code, but it might be worth a try in case the networking layer has changed. -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
Myklebust, Trond <Trond.Myklebust@...> writes: > > > > I suppose we could try 3.5 at this point. > > If you've been keeping up with the 3.2 stable releases, then I wouldn't > expect any major differences to the sunrpc code, but it might be worth a > try in case the networking layer has changed. > Hi Trond, I got the same problem on kernel 3.5 when using TCP mounts. The hung task info and rpc_debug messages were just the same as what I got on kernel 3.3. -- Regards, Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Thu, 2012-09-13 at 05:22 +0000, Yan-Pai Chen wrote: > Myklebust, Trond <Trond.Myklebust@...> writes: > > > > > > > I suppose we could try 3.5 at this point. > > > > If you've been keeping up with the 3.2 stable releases, then I wouldn't > > expect any major differences to the sunrpc code, but it might be worth a > > try in case the networking layer has changed. > > > > Hi Trond, > > I got the same problem on kernel 3.5 when using TCP mounts. > The hung task info and rpc_debug messages were just the same as what > I got on kernel 3.3. Thanks. Can you try using the patch that I posted on the list yesterday? It should apply cleanly to the latest 3.5.x kernel. Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com
Myklebust, Trond <Trond.Myklebust@...> writes: > > Thanks. Can you try using the patch that I posted on the list yesterday? > It should apply cleanly to the latest 3.5.x kernel. > > Cheers > Trond Hi Trond, I can reproduce the same problem after applying your latest patch on 3.5.0 kernel. But it works for the later 3.5.x kernel (fbcbe2b3c9). -- Regards, Andrew -- To unsubscribe from this list: send the line "unsubscribe linux-nfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index cff40aa..bf8c49f 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -114,6 +114,7 @@ struct rpc_xprt_ops { void (*set_buffer_size)(struct rpc_xprt *xprt, size_t sndsize, size_t rcvsize); int (*reserve_xprt)(struct rpc_xprt *xprt, struct rpc_task *task); void (*release_xprt)(struct rpc_xprt *xprt, struct rpc_task *task); + void (*alloc_slot)(struct rpc_xprt *xprt, struct rpc_task *task); void (*rpcbind)(struct rpc_task *task); void (*set_port)(struct rpc_xprt *xprt, unsigned short port); void (*connect)(struct rpc_task *task); @@ -281,6 +282,8 @@ void xprt_connect(struct rpc_task *task); void xprt_reserve(struct rpc_task *task); int xprt_reserve_xprt(struct rpc_xprt *xprt, struct rpc_task *task); int xprt_reserve_xprt_cong(struct rpc_xprt *xprt, struct rpc_task *task); +void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); +void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task); int xprt_prepare_transmit(struct rpc_task *task); void xprt_transmit(struct rpc_task *task); void xprt_end_transmit(struct rpc_task *task); diff --git a/net/sunrpc/xprt.c b/net/sunrpc/xprt.c index a5a402a..5d7f61d 100644 --- a/net/sunrpc/xprt.c +++ b/net/sunrpc/xprt.c @@ -969,11 +969,11 @@ static bool xprt_dynamic_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req) return false; } -static void xprt_alloc_slot(struct rpc_task *task) +void xprt_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) { - struct rpc_xprt *xprt = task->tk_xprt; struct rpc_rqst *req; + spin_lock(&xprt->reserve_lock); if (!list_empty(&xprt->free)) { req = list_entry(xprt->free.next, struct rpc_rqst, rq_list); list_del(&req->rq_list); @@ -994,12 +994,29 @@ static void xprt_alloc_slot(struct rpc_task *task) default: task->tk_status = -EAGAIN; } + spin_unlock(&xprt->reserve_lock); return; out_init_req: task->tk_status = 0; task->tk_rqstp = req; xprt_request_init(task, xprt); + spin_unlock(&xprt->reserve_lock); +} +EXPORT_SYMBOL_GPL(xprt_alloc_slot); + +void xprt_lock_and_alloc_slot(struct rpc_xprt *xprt, struct rpc_task *task) +{ + /* Note: grabbing the xprt_lock_write() ensures that we throttle + * new slot allocation if the transport is congested (i.e. when + * reconnecting a stream transport or when out of socket write + * buffer space). + */ + if (xprt_lock_write(xprt, task)) { + xprt_alloc_slot(xprt, task); + xprt_release_write(xprt, task); + } } +EXPORT_SYMBOL_GPL(xprt_lock_and_alloc_slot); static void xprt_free_slot(struct rpc_xprt *xprt, struct rpc_rqst *req) { @@ -1083,20 +1100,9 @@ void xprt_reserve(struct rpc_task *task) if (task->tk_rqstp != NULL) return; - /* Note: grabbing the xprt_lock_write() here is not strictly needed, - * but ensures that we throttle new slot allocation if the transport - * is congested (e.g. if reconnecting or if we're out of socket - * write buffer space). - */ task->tk_timeout = 0; task->tk_status = -EAGAIN; - if (!xprt_lock_write(xprt, task)) - return; - - spin_lock(&xprt->reserve_lock); - xprt_alloc_slot(task); - spin_unlock(&xprt->reserve_lock); - xprt_release_write(xprt, task); + xprt->ops->alloc_slot(xprt, task); } static inline __be32 xprt_alloc_xid(struct rpc_xprt *xprt) diff --git a/net/sunrpc/xprtrdma/transport.c b/net/sunrpc/xprtrdma/transport.c index 06cdbff..5d9202d 100644 --- a/net/sunrpc/xprtrdma/transport.c +++ b/net/sunrpc/xprtrdma/transport.c @@ -713,6 +713,7 @@ static void xprt_rdma_print_stats(struct rpc_xprt *xprt, struct seq_file *seq) static struct rpc_xprt_ops xprt_rdma_procs = { .reserve_xprt = xprt_rdma_reserve_xprt, .release_xprt = xprt_release_xprt_cong, /* sunrpc/xprt.c */ + .alloc_slot = xprt_alloc_slot, .release_request = xprt_release_rqst_cong, /* ditto */ .set_retrans_timeout = xprt_set_retrans_timeout_def, /* ditto */ .rpcbind = rpcb_getport_async, /* sunrpc/rpcb_clnt.c */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 4005672..a35b8e5 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -2473,6 +2473,7 @@ static void bc_destroy(struct rpc_xprt *xprt) static struct rpc_xprt_ops xs_local_ops = { .reserve_xprt = xprt_reserve_xprt, .release_xprt = xs_tcp_release_xprt, + .alloc_slot = xprt_alloc_slot, .rpcbind = xs_local_rpcbind, .set_port = xs_local_set_port, .connect = xs_connect, @@ -2489,6 +2490,7 @@ static struct rpc_xprt_ops xs_udp_ops = { .set_buffer_size = xs_udp_set_buffer_size, .reserve_xprt = xprt_reserve_xprt_cong, .release_xprt = xprt_release_xprt_cong, + .alloc_slot = xprt_alloc_slot, .rpcbind = rpcb_getport_async, .set_port = xs_set_port, .connect = xs_connect, @@ -2506,6 +2508,7 @@ static struct rpc_xprt_ops xs_udp_ops = { static struct rpc_xprt_ops xs_tcp_ops = { .reserve_xprt = xprt_reserve_xprt, .release_xprt = xs_tcp_release_xprt, + .alloc_slot = xprt_lock_and_alloc_slot, .rpcbind = rpcb_getport_async, .set_port = xs_set_port, .connect = xs_connect,