From patchwork Tue Oct 23 21:58:35 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Trond Myklebust <Trond.Myklebust@netapp.com>
X-Patchwork-Id: 1635061
Return-Path: <linux-nfs-owner@vger.kernel.org>
X-Original-To: patchwork-linux-nfs@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork1.kernel.org (Postfix) with ESMTP id B6CAC3FD4E
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue, 23 Oct 2012 21:58:39 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754499Ab2JWV6i (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Tue, 23 Oct 2012 17:58:38 -0400
Received: from mx2.netapp.com ([216.240.18.37]:48096 "EHLO mx2.netapp.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754332Ab2JWV6h (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Tue, 23 Oct 2012 17:58:37 -0400
X-IronPort-AV: E=Sophos;i="4.80,637,1344236400"; d="scan'208";a="703412143"
Received: from smtp1.corp.netapp.com ([10.57.156.124])
	by mx2-out.netapp.com with ESMTP; 23 Oct 2012 14:58:36 -0700
Received: from vmwexceht03-prd.hq.netapp.com (exchsmtp.hq.netapp.com
	[10.106.76.241])
	by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id
	q9NLwabR006515; Tue, 23 Oct 2012 14:58:36 -0700 (PDT)
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.195]) by
	vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id
	14.02.0318.001; Tue, 23 Oct 2012 14:58:35 -0700
From: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
To: Chris Perl <chris.perl@gmail.com>
CC: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: RPC Race Condition
Thread-Topic: RPC Race Condition
Thread-Index: 
 AQHNsH+vZqboCGLoxE+kGAH7wcOO15fGGkiAgAAhBICAAA8xgIABE0qAgAAmKICAAETfgIAAHqMA
Date: Tue, 23 Oct 2012 21:58:35 +0000
Message-ID: 
 <4FA345DA4F4AE44899BD2B03EEEC2FA90928DCDD@SACEXCMBX04-PRD.hq.netapp.com>
References: <20121022180339.GC24763@nyc-qws-132.nyc.delacy.com>
	<4FA345DA4F4AE44899BD2B03EEEC2FA909289403@SACEXCMBX04-PRD.hq.netapp.com>
	<20121022202611.GA27191@nyc-qws-132.nyc.delacy.com>
	<4FA345DA4F4AE44899BD2B03EEEC2FA90928AB39@SACEXCMBX04-PRD.hq.netapp.com>
	<20121023134551.GB27191@nyc-qws-132.nyc.delacy.com>
	<4FA345DA4F4AE44899BD2B03EEEC2FA90928C74A@SACEXCMBX04-PRD.hq.netapp.com>
	<20121023200855.GC27191@nyc-qws-132.nyc.delacy.com>
In-Reply-To: <20121023200855.GC27191@nyc-qws-132.nyc.delacy.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: 
X-MS-TNEF-Correlator: 
x-originating-ip: [10.104.60.115]
Content-ID: <B3397A25EBEE3844BFB729598F907066@tahoe.netapp.com>
MIME-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org

On Tue, 2012-10-23 at 16:08 -0400, Chris Perl wrote:
> On Tue, Oct 23, 2012 at 04:02:27PM +0000, Myklebust, Trond wrote:
> > call_transmit() should normally just cause the rpc_task to detect that
> > the connection is down, and should cause it to go to call_connect in
> > order to fix everything.
> > 
> > If it is causing a race, then I'm thinking that is probably due to the
> > call to xs_tcp_shutdown() when we get an EPIPE error in
> > xs_tcp_send_request(). That shutdown seems to be needed only in the case
> > when we're in TCP_CLOSE_WAIT, in which case what we really want is to
> > force the rpc_task to go to xprt_connect, which means clearing
> > xprt_connected().
> > 
> > Hmm.... Can you try applying the attached 2 patches?
> 
> I rebuilt with those two patches as well as the first one that removed
> the `xs_error_report' callback.
> 
> Good news, it works!
> 
> However, looking at the interactions, it still seems a little bit racy.
> Clearly you know more about this than I do, so I'll defer to your better
> judgement, but let me lay out what I believe is happening:
> 
>             Stat Thread                                rpciod
>            -------------                              --------
>   - woken up with tk_status set to
>     EAGAIN via xs_tcp_state_change
> 
>   - tries call_transmit again, which
>     calls xs_tcp_send_request which
>     comes back with ECONNRESET (b/c this
>     is what sk_err is)
> 
>   - call_status sees this and goes to
>     sleep for 3 seconds and then sets
>     tk_action to call_bind
> 
>   - call_bind
> 
>   - call_connect, queues execution of
>     xs_tcp_setup_socket with rpciod and
>     goes to sleep
>     			
> 					   - starts doing its thing, but wakes
> 					     up our other thread before its done
> 					     trying to connect the socket (or
> 					     failing to do so) via the call
> 					     stack mentioned before (via
> 					     xprt_disconnect_done)
> 					     
>   - tk_status is set to EAGAIN, so
>     call_connect_status moves on to
>     call_transmit
> 
>   - call_transmit calls
>     xs_tcp_send_request again which
>     finds the socket with sk_shutdown !=
>     0 and so returns EPIPE
> 
>   - call_status sees the EPIPE and sets
>     tk_action back to call_bind
> 
> 					   - continues on and updates the state
> 					     of the socket to TCP_SYN_SENT
> 
> 					   - When the connection is established,
> 					     xs_tcp_state_change takes care of
> 					     marking the xprt connected
> 
>   - call_bind
> 
>   - call_connect, however this time it
>     schedules no work because it finds
>     that the transport is already
>     connected
> 
>   - call_transmit, succeeds, we recover
> 
> This is how the recovery went when I was testing, but I could imagine a
> situation where different timing comes into play and we wind up asking
> rpciod to execute xs_tcp_setup_socket twice (because the first connect
> hasn't been established yet).  I haven't thought too hard about what
> that would mean, but I imagine it probably wouldn't be good.
> 
> Like I said, you know way more about this than I do, I just wanted to point this
> out.  Its entirely possible that I'm missing or misunderstanding something.

The only problem there would be the call to xs_sock_mark_closed() in
xs_abort_connection. As far as I can tell, all we want to do there is
clear all those flags.

How about the following?

Cheers
  Trond
8<------------------------------------------------------------------
From c2d1bd4ab6c85cdbdf3380de2269817092fb9a11 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue, 23 Oct 2012 17:50:07 -0400
Subject: [PATCH] SUNRPC: Prevent races in xs_abort_connection()

The call to xprt_disconnect_done() that is triggered by a successful
connection reset will trigger another automatic wakeup of all tasks
on the xprt->pending rpc_wait_queue. In particular it will cause an
early wake up of the task that called xprt_connect().

All we really want to do here is clear all the socket-specific state
flags, so we split that functionality out of xs_sock_mark_closed()
into a helper that can be called by xs_abort_connection()

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
---
 net/sunrpc/xprtsock.c | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

-- 
1.7.11.7


-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 7e2dd0d..1f105c2 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -1453,7 +1453,7 @@ static void xs_tcp_cancel_linger_timeout(struct rpc_xprt *xprt)
 	xprt_clear_connecting(xprt);
 }
 
-static void xs_sock_mark_closed(struct rpc_xprt *xprt)
+static void xs_sock_reset_connection_flags(struct rpc_xprt *xprt)
 {
 	smp_mb__before_clear_bit();
 	clear_bit(XPRT_CONNECTION_ABORT, &xprt->state);
@@ -1461,6 +1461,11 @@ static void xs_sock_mark_closed(struct rpc_xprt *xprt)
 	clear_bit(XPRT_CLOSE_WAIT, &xprt->state);
 	clear_bit(XPRT_CLOSING, &xprt->state);
 	smp_mb__after_clear_bit();
+}
+
+static void xs_sock_mark_closed(struct rpc_xprt *xprt)
+{
+	xs_sock_reset_connection_flags(xprt);
 	/* Mark transport as closed and wake up all pending tasks */
 	xprt_disconnect_done(xprt);
 }
@@ -2051,10 +2056,8 @@ static void xs_abort_connection(struct sock_xprt *transport)
 	any.sa_family = AF_UNSPEC;
 	result = kernel_connect(transport->sock, &any, sizeof(any), 0);
 	if (!result)
-		xs_sock_mark_closed(&transport->xprt);
-	else
-		dprintk("RPC:       AF_UNSPEC connect return code %d\n",
-				result);
+		xs_sock_reset_connection_flags(&transport->xprt);
+	dprintk("RPC:       AF_UNSPEC connect return code %d\n", result);
 }
 
 static void xs_tcp_reuse_connection(struct sock_xprt *transport)