From patchwork Tue Oct 23 21:58:35 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Trond Myklebust X-Patchwork-Id: 1635061 Return-Path: X-Original-To: patchwork-linux-nfs@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork1.kernel.org Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by patchwork1.kernel.org (Postfix) with ESMTP id B6CAC3FD4E for ; Tue, 23 Oct 2012 21:58:39 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754499Ab2JWV6i (ORCPT ); Tue, 23 Oct 2012 17:58:38 -0400 Received: from mx2.netapp.com ([216.240.18.37]:48096 "EHLO mx2.netapp.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754332Ab2JWV6h (ORCPT ); Tue, 23 Oct 2012 17:58:37 -0400 X-IronPort-AV: E=Sophos;i="4.80,637,1344236400"; d="scan'208";a="703412143" Received: from smtp1.corp.netapp.com ([10.57.156.124]) by mx2-out.netapp.com with ESMTP; 23 Oct 2012 14:58:36 -0700 Received: from vmwexceht03-prd.hq.netapp.com (exchsmtp.hq.netapp.com [10.106.76.241]) by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id q9NLwabR006515; Tue, 23 Oct 2012 14:58:36 -0700 (PDT) Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.195]) by vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id 14.02.0318.001; Tue, 23 Oct 2012 14:58:35 -0700 From: "Myklebust, Trond" To: Chris Perl CC: "linux-nfs@vger.kernel.org" Subject: Re: RPC Race Condition Thread-Topic: RPC Race Condition Thread-Index: AQHNsH+vZqboCGLoxE+kGAH7wcOO15fGGkiAgAAhBICAAA8xgIABE0qAgAAmKICAAETfgIAAHqMA Date: Tue, 23 Oct 2012 21:58:35 +0000 Message-ID: <4FA345DA4F4AE44899BD2B03EEEC2FA90928DCDD@SACEXCMBX04-PRD.hq.netapp.com> References: <20121022180339.GC24763@nyc-qws-132.nyc.delacy.com> <4FA345DA4F4AE44899BD2B03EEEC2FA909289403@SACEXCMBX04-PRD.hq.netapp.com> <20121022202611.GA27191@nyc-qws-132.nyc.delacy.com> <4FA345DA4F4AE44899BD2B03EEEC2FA90928AB39@SACEXCMBX04-PRD.hq.netapp.com> <20121023134551.GB27191@nyc-qws-132.nyc.delacy.com> <4FA345DA4F4AE44899BD2B03EEEC2FA90928C74A@SACEXCMBX04-PRD.hq.netapp.com> <20121023200855.GC27191@nyc-qws-132.nyc.delacy.com> In-Reply-To: <20121023200855.GC27191@nyc-qws-132.nyc.delacy.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.104.60.115] Content-ID: MIME-Version: 1.0 Sender: linux-nfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-nfs@vger.kernel.org On Tue, 2012-10-23 at 16:08 -0400, Chris Perl wrote: > On Tue, Oct 23, 2012 at 04:02:27PM +0000, Myklebust, Trond wrote: > > call_transmit() should normally just cause the rpc_task to detect that > > the connection is down, and should cause it to go to call_connect in > > order to fix everything. > > > > If it is causing a race, then I'm thinking that is probably due to the > > call to xs_tcp_shutdown() when we get an EPIPE error in > > xs_tcp_send_request(). That shutdown seems to be needed only in the case > > when we're in TCP_CLOSE_WAIT, in which case what we really want is to > > force the rpc_task to go to xprt_connect, which means clearing > > xprt_connected(). > > > > Hmm.... Can you try applying the attached 2 patches? > > I rebuilt with those two patches as well as the first one that removed > the `xs_error_report' callback. > > Good news, it works! > > However, looking at the interactions, it still seems a little bit racy. > Clearly you know more about this than I do, so I'll defer to your better > judgement, but let me lay out what I believe is happening: > > Stat Thread rpciod > ------------- -------- > - woken up with tk_status set to > EAGAIN via xs_tcp_state_change > > - tries call_transmit again, which > calls xs_tcp_send_request which > comes back with ECONNRESET (b/c this > is what sk_err is) > > - call_status sees this and goes to > sleep for 3 seconds and then sets > tk_action to call_bind > > - call_bind > > - call_connect, queues execution of > xs_tcp_setup_socket with rpciod and > goes to sleep > > - starts doing its thing, but wakes > up our other thread before its done > trying to connect the socket (or > failing to do so) via the call > stack mentioned before (via > xprt_disconnect_done) > > - tk_status is set to EAGAIN, so > call_connect_status moves on to > call_transmit > > - call_transmit calls > xs_tcp_send_request again which > finds the socket with sk_shutdown != > 0 and so returns EPIPE > > - call_status sees the EPIPE and sets > tk_action back to call_bind > > - continues on and updates the state > of the socket to TCP_SYN_SENT > > - When the connection is established, > xs_tcp_state_change takes care of > marking the xprt connected > > - call_bind > > - call_connect, however this time it > schedules no work because it finds > that the transport is already > connected > > - call_transmit, succeeds, we recover > > This is how the recovery went when I was testing, but I could imagine a > situation where different timing comes into play and we wind up asking > rpciod to execute xs_tcp_setup_socket twice (because the first connect > hasn't been established yet). I haven't thought too hard about what > that would mean, but I imagine it probably wouldn't be good. > > Like I said, you know way more about this than I do, I just wanted to point this > out. Its entirely possible that I'm missing or misunderstanding something. The only problem there would be the call to xs_sock_mark_closed() in xs_abort_connection. As far as I can tell, all we want to do there is clear all those flags. How about the following? Cheers Trond 8<------------------------------------------------------------------ From c2d1bd4ab6c85cdbdf3380de2269817092fb9a11 Mon Sep 17 00:00:00 2001 From: Trond Myklebust Date: Tue, 23 Oct 2012 17:50:07 -0400 Subject: [PATCH] SUNRPC: Prevent races in xs_abort_connection() The call to xprt_disconnect_done() that is triggered by a successful connection reset will trigger another automatic wakeup of all tasks on the xprt->pending rpc_wait_queue. In particular it will cause an early wake up of the task that called xprt_connect(). All we really want to do here is clear all the socket-specific state flags, so we split that functionality out of xs_sock_mark_closed() into a helper that can be called by xs_abort_connection() Reported-by: Chris Perl Signed-off-by: Trond Myklebust Cc: stable@vger.kernel.org --- net/sunrpc/xprtsock.c | 13 ++++++++----- 1 file changed, 8 insertions(+), 5 deletions(-) -- 1.7.11.7 -- Trond Myklebust Linux NFS client maintainer NetApp Trond.Myklebust@netapp.com www.netapp.com diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 7e2dd0d..1f105c2 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1453,7 +1453,7 @@ static void xs_tcp_cancel_linger_timeout(struct rpc_xprt *xprt) xprt_clear_connecting(xprt); } -static void xs_sock_mark_closed(struct rpc_xprt *xprt) +static void xs_sock_reset_connection_flags(struct rpc_xprt *xprt) { smp_mb__before_clear_bit(); clear_bit(XPRT_CONNECTION_ABORT, &xprt->state); @@ -1461,6 +1461,11 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt) clear_bit(XPRT_CLOSE_WAIT, &xprt->state); clear_bit(XPRT_CLOSING, &xprt->state); smp_mb__after_clear_bit(); +} + +static void xs_sock_mark_closed(struct rpc_xprt *xprt) +{ + xs_sock_reset_connection_flags(xprt); /* Mark transport as closed and wake up all pending tasks */ xprt_disconnect_done(xprt); } @@ -2051,10 +2056,8 @@ static void xs_abort_connection(struct sock_xprt *transport) any.sa_family = AF_UNSPEC; result = kernel_connect(transport->sock, &any, sizeof(any), 0); if (!result) - xs_sock_mark_closed(&transport->xprt); - else - dprintk("RPC: AF_UNSPEC connect return code %d\n", - result); + xs_sock_reset_connection_flags(&transport->xprt); + dprintk("RPC: AF_UNSPEC connect return code %d\n", result); } static void xs_tcp_reuse_connection(struct sock_xprt *transport)