From patchwork Tue Oct 23 16:02:27 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 1631501 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id 223A43FD4E for ; Tue, 23 Oct 2012 16:02:30 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754331Ab2JWQC3 (ORCPT ); Tue, 23 Oct 2012 12:02:29 -0400 Received: from mx2.netapp.com ([216.240.18.37]:4619 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754198Ab2JWQC2 (ORCPT ); Tue, 23 Oct 2012 12:02:28 -0400 X-IronPort-AV: E=Sophos;i="4.80,635,1344236400"; d="scan'208,223";a="703296344" Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx2-out.netapp.com with ESMTP; 23 Oct 2012 09:02:28 -0700 Received: from vmwexceht03-prd.hq.netapp.com (vmwexceht03-prd.hq.netapp.com [10.106.76.241]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id q9NG2S6N025472; Tue, 23 Oct 2012 09:02:28 -0700 (PDT) Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.195]) by vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.02.0318.001; Tue, 23 Oct 2012 09:02:27 -0700 From: "Myklebust, Trond" To: Chris Perl CC: "linux-nfs@vger.kernel.org" Subject: Re: RPC Race Condition Thread-Topic: RPC Race Condition Thread-Index: AQHNsH+vZqboCGLoxE+kGAH7wcOO15fGGkiAgAAhBICAAA8xgIABE0qAgAAmKIA= Date: Tue, 23 Oct 2012 16:02:27 +0000 Message-ID: <4FA345DA4F4AE44899BD2B03EEEC2FA90928C74A@SACEXCMBX04-PRD.hq.netapp.com> References: <20121022180339.GC24763@nyc-qws-132.nyc.delacy.com> <4FA345DA4F4AE44899BD2B03EEEC2FA909289403@SACEXCMBX04-PRD.hq.netapp.com> <20121022202611.GA27191@nyc-qws-132.nyc.delacy.com> <4FA345DA4F4AE44899BD2B03EEEC2FA90928AB39@SACEXCMBX04-PRD.hq.netapp.com> <20121023134551.GB27191@nyc-qws-132.nyc.delacy.com> In-Reply-To: <20121023134551.GB27191@nyc-qws-132.nyc.delacy.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: yes X-MS-TNEF-Correlator: x-originating-ip: [10.104.60.116] MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, 2012-10-23 at 09:45 -0400, Chris Perl wrote: > On Mon, Oct 22, 2012 at 09:20:34PM +0000, Myklebust, Trond wrote: > > > This is what I changed: > > > > > > --- a/net/sunrpc/xprtsock.c 2012-08-14 08:47:16.000000000 -0400 > > > +++ b/net/sunrpc/xprtsock.c 2012-10-22 14:50:09.237725498 -0400 > > > @@ -1571,7 +1571,8 @@ > > > dprintk("RPC: %s client %p...\n" > > > "RPC: error %d\n", > > > __func__, xprt, sk->sk_err); > > > - xprt_wake_pending_tasks(xprt, -EAGAIN); > > > + if (!xprt_connecting(xprt)) > > > + xprt_wake_pending_tasks(xprt, -EAGAIN); > > > out: > > > read_unlock_bh(&sk->sk_callback_lock); > > > } > > > > > > Unfortunately it does not resolve it. It appears that does stop the wakeup > > > from happening where I detailed it before, but we still wake up too early. The > > > following is the call stack from SystemTap catching the call to `rpc_make_runnable': > > > > > > 0xffffffffa0434840 : rpc_make_runnable+0x0/0x80 [sunrpc] > > > 0xffffffffa0434b5e : rpc_wake_up_task_queue_locked+0x18e/0x270 [sunrpc] > > > 0xffffffffa0434c8e : rpc_wake_up_status+0x4e/0x80 [sunrpc] > > > 0xffffffffa042d8ec : xprt_wake_pending_tasks+0x2c/0x30 [sunrpc] > > > 0xffffffffa042da56 : xprt_disconnect_done+0x46/0x70 [sunrpc] > > > 0xffffffffa04305b0 : xs_tcp_setup_socket+0x330/0x4c0 [sunrpc] > > > 0xffffffff8108c760 : worker_thread+0x170/0x2a0 [kernel] > > > 0xffffffff81091d66 : kthread+0x96/0xa0 [kernel] > > > 0xffffffff8100c14a : child_rip+0xa/0x20 [kernel] > > > > > > However, its mildly confusing to me since I cannot follow that exact > > > call stack in the code. The nearest I can find to that is the > > > following: > > > > > > rpc_make_runnable > > > rpc_wake_up_task_queue_locked > > > rpc_wake_up_status > > > xprt_wake_pending_tasks > > > xprt_disconnect_done > > > xs_sock_mark_closed > > > xs_abort_connection > > > xs_tcp_reuse_connection > > > xs_tcp_setup_socket > > > > > > As far as I can tell, we get here because the call to `kernel_connect' > > > from within `xs_abort_connection' by rpciod is returning 0 sometimes > > > (and EINPROGRESS others). When it returns 0, we enter this block: > > > > > > if (!result) > > > xs_sock_mark_closed(&transport->xprt); > > > else > > > dprintk("RPC: AF_UNSPEC connect return code %d\n", result); > > > > > > and that eventually gets us woken up via the call stack above. The end > > > result is the same, we wake up and move on to `call_transmit' before > > > rpciod is done connecting the socket. > > > > > > I'll do some more tracing and see what else I can come up with. > > > > > > Chris > > > > OK, then let's just get rid of xs_error_report(). All the cases that we > > care about will call tcp_done(), which will induce a state change when > > needed. > > > > Cheers, > > Trond > > 8<--------------------------------------------------------------------- > > From 047ee9ea072698f1d35573c1f26dc02d13a4af7f Mon Sep 17 00:00:00 2001 > > From: Trond Myklebust > > Date: Mon, 22 Oct 2012 17:14:36 -0400 > > Subject: [PATCH] SUNRPC: Get rid of the xs_error_report socket callback > > > > Chris Perl reports that we're seeing races between the wakeup call in > > xs_error_report and the connect attempts. Basically, Chris has shown > > that in certain circumstances, the call to xs_error_report causes the > > rpc_task that is responsible for reconnecting to wake up early, thus > > triggering a disconnect and retry. > > > > Since the sk->sk_error_report() calls in the socket layer are always > > followed by a tcp_done() in the cases where we care about waking up > > the rpc_tasks, just let the state_change callbacks take responsibility > > for those wake ups. > > > > Reported-by: Chris Perl > > Signed-off-by: Trond Myklebust > > --- > > net/sunrpc/xprtsock.c | 25 ------------------------- > > 1 file changed, 25 deletions(-) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index aaaadfb..89bde21 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -254,7 +254,6 @@ struct sock_xprt { > > void (*old_data_ready)(struct sock *, int); > > void (*old_state_change)(struct sock *); > > void (*old_write_space)(struct sock *); > > - void (*old_error_report)(struct sock *); > > }; > > > > /* > > @@ -781,7 +780,6 @@ static void xs_save_old_callbacks(struct sock_xprt *transport, struct sock *sk) > > transport->old_data_ready = sk->sk_data_ready; > > transport->old_state_change = sk->sk_state_change; > > transport->old_write_space = sk->sk_write_space; > > - transport->old_error_report = sk->sk_error_report; > > } > > > > static void xs_restore_old_callbacks(struct sock_xprt *transport, struct sock *sk) > > @@ -789,7 +787,6 @@ static void xs_restore_old_callbacks(struct sock_xprt *transport, struct sock *s > > sk->sk_data_ready = transport->old_data_ready; > > sk->sk_state_change = transport->old_state_change; > > sk->sk_write_space = transport->old_write_space; > > - sk->sk_error_report = transport->old_error_report; > > } > > > > static void xs_reset_transport(struct sock_xprt *transport) > > @@ -1540,25 +1537,6 @@ static void xs_tcp_state_change(struct sock *sk) > > read_unlock_bh(&sk->sk_callback_lock); > > } > > > > -/** > > - * xs_error_report - callback mainly for catching socket errors > > - * @sk: socket > > - */ > > -static void xs_error_report(struct sock *sk) > > -{ > > - struct rpc_xprt *xprt; > > - > > - read_lock_bh(&sk->sk_callback_lock); > > - if (!(xprt = xprt_from_sock(sk))) > > - goto out; > > - dprintk("RPC: %s client %p...\n" > > - "RPC: error %d\n", > > - __func__, xprt, sk->sk_err); > > - xprt_wake_pending_tasks(xprt, -EAGAIN); > > -out: > > - read_unlock_bh(&sk->sk_callback_lock); > > -} > > - > > static void xs_write_space(struct sock *sk) > > { > > struct socket *sock; > > @@ -1858,7 +1836,6 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt, > > sk->sk_user_data = xprt; > > sk->sk_data_ready = xs_local_data_ready; > > sk->sk_write_space = xs_udp_write_space; > > - sk->sk_error_report = xs_error_report; > > sk->sk_allocation = GFP_ATOMIC; > > > > xprt_clear_connected(xprt); > > @@ -1983,7 +1960,6 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > > sk->sk_user_data = xprt; > > sk->sk_data_ready = xs_udp_data_ready; > > sk->sk_write_space = xs_udp_write_space; > > - sk->sk_error_report = xs_error_report; > > sk->sk_no_check = UDP_CSUM_NORCV; > > sk->sk_allocation = GFP_ATOMIC; > > > > @@ -2098,7 +2074,6 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > > sk->sk_data_ready = xs_tcp_data_ready; > > sk->sk_state_change = xs_tcp_state_change; > > sk->sk_write_space = xs_tcp_write_space; > > - sk->sk_error_report = xs_error_report; > > sk->sk_allocation = GFP_ATOMIC; > > > > /* socket options */ > > Hi Trond, > > First, thanks for being so responsive! I never expected to get such > quick feedback and help. > > I think perhaps you misread my last email. Adding the check for > `xprt_connecting(xprt)' to `xs_error_report' _did_ stop the early wake > up from happening at that point in the code. Removing the > entire call back with your patch works just as well. OK. I did misunderstand that. > I have confirmed that I can still reproduce the problem with this > patch. > > Now that we're not waking up early because of `xs_error_report', we're > waking up early in another part of the code as I mentioned above. I'm > not entirely clear about all the details here yet. > > What I do know is that we're woken up with `tk_status' set to EGAIN, and > since `call_connect_status' doesn't do anything special for that, we move > on to `call_transmit'. Meanwhile, rpciod is still trying to connect the > socket (eventually succeeding) and so we're racing. > > Question: should `call_connect_status' treat EAGAIN specially and reset > `tk_action' to `call_connect' again (rather than just moving on to > `call_transmit')? I don't think this alone would fix things properly, > but seems logical to me. Perhaps I'm missing something. call_transmit() should normally just cause the rpc_task to detect that the connection is down, and should cause it to go to call_connect in order to fix everything. If it is causing a race, then I'm thinking that is probably due to the call to xs_tcp_shutdown() when we get an EPIPE error in xs_tcp_send_request(). That shutdown seems to be needed only in the case when we're in TCP_CLOSE_WAIT, in which case what we really want is to force the rpc_task to go to xprt_connect, which means clearing xprt_connected(). Hmm.... Can you try applying the attached 2 patches? Cheers Trond -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com From ed42390cf42a0ea483d7a8d7e2e2b397ec0a8f00 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 23 Oct 2012 11:40:02 -0400 Subject: [PATCH 2/2] Revert "SUNRPC: Ensure we close the socket on EPIPE errors too..." This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e. Now that we clear the connected flag when entering TCP_CLOSE_WAIT, the deadlock described in this commit is no longer possible. Instead, the resulting call to xs_tcp_shutdown() can interfere with pending reconnection attempts. Reported-by: Chris Perl Signed-off-by: Trond Myklebust Cc: stable@vger.kernel.org --- net/sunrpc/xprtsock.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 6e6967d..7e2dd0d 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -737,10 +737,10 @@ static int xs_tcp_send_request(struct rpc_task *task) dprintk("RPC: sendmsg returned unrecognized error %d\n", -status); case -ECONNRESET: - case -EPIPE: xs_tcp_shutdown(xprt); case -ECONNREFUSED: case -ENOTCONN: + case -EPIPE: clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags); } -- 1.7.11.7