From patchwork Tue Oct 23 16:02:27 2012
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: Trond Myklebust <Trond.Myklebust@netapp.com>
X-Patchwork-Id: 1631501
Return-Path: <linux-nfs-owner@vger.kernel.org>
X-Original-To: patchwork-linux-nfs@patchwork.kernel.org
Delivered-To: patchwork-process-083081@patchwork1.kernel.org
Received: from vger.kernel.org (vger.kernel.org [209.132.180.67])
	by patchwork1.kernel.org (Postfix) with ESMTP id 223A43FD4E
	for <patchwork-linux-nfs@patchwork.kernel.org>;
	Tue, 23 Oct 2012 16:02:30 +0000 (UTC)
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S1754331Ab2JWQC3 (ORCPT
	<rfc822;patchwork-linux-nfs@patchwork.kernel.org>);
	Tue, 23 Oct 2012 12:02:29 -0400
Received: from mx2.netapp.com ([216.240.18.37]:4619 "EHLO mx2.netapp.com"
	rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
	id S1754198Ab2JWQC2 (ORCPT <rfc822;linux-nfs@vger.kernel.org>);
	Tue, 23 Oct 2012 12:02:28 -0400
X-IronPort-AV: E=Sophos;i="4.80,635,1344236400";
	d="scan'208,223";a="703296344"
Received: from smtp1.corp.netapp.com ([10.57.156.124])
	by mx2-out.netapp.com with ESMTP; 23 Oct 2012 09:02:28 -0700
Received: from vmwexceht03-prd.hq.netapp.com (vmwexceht03-prd.hq.netapp.com
	[10.106.76.241])
	by smtp1.corp.netapp.com (8.13.1/8.13.1/NTAP-1.6) with ESMTP id
	q9NG2S6N025472; Tue, 23 Oct 2012 09:02:28 -0700 (PDT)
Received: from SACEXCMBX04-PRD.hq.netapp.com ([169.254.6.195]) by
	vmwexceht03-prd.hq.netapp.com ([10.106.76.241]) with mapi id
	14.02.0318.001; Tue, 23 Oct 2012 09:02:27 -0700
From: "Myklebust, Trond" <Trond.Myklebust@netapp.com>
To: Chris Perl <chris.perl@gmail.com>
CC: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: RPC Race Condition
Thread-Topic: RPC Race Condition
Thread-Index: AQHNsH+vZqboCGLoxE+kGAH7wcOO15fGGkiAgAAhBICAAA8xgIABE0qAgAAmKIA=
Date: Tue, 23 Oct 2012 16:02:27 +0000
Message-ID: 
 <4FA345DA4F4AE44899BD2B03EEEC2FA90928C74A@SACEXCMBX04-PRD.hq.netapp.com>
References: <20121022180339.GC24763@nyc-qws-132.nyc.delacy.com>
	<4FA345DA4F4AE44899BD2B03EEEC2FA909289403@SACEXCMBX04-PRD.hq.netapp.com>
	<20121022202611.GA27191@nyc-qws-132.nyc.delacy.com>
	<4FA345DA4F4AE44899BD2B03EEEC2FA90928AB39@SACEXCMBX04-PRD.hq.netapp.com>
	<20121023134551.GB27191@nyc-qws-132.nyc.delacy.com>
In-Reply-To: <20121023134551.GB27191@nyc-qws-132.nyc.delacy.com>
Accept-Language: en-US
Content-Language: en-US
X-MS-Has-Attach: yes
X-MS-TNEF-Correlator: 
x-originating-ip: [10.104.60.116]
MIME-Version: 1.0
Sender: linux-nfs-owner@vger.kernel.org
Precedence: bulk
List-ID: <linux-nfs.vger.kernel.org>
X-Mailing-List: linux-nfs@vger.kernel.org

On Tue, 2012-10-23 at 09:45 -0400, Chris Perl wrote:
> On Mon, Oct 22, 2012 at 09:20:34PM +0000, Myklebust, Trond wrote:
> > > This is what I changed:
> > > 
> > > --- a/net/sunrpc/xprtsock.c     2012-08-14 08:47:16.000000000 -0400
> > > +++ b/net/sunrpc/xprtsock.c     2012-10-22 14:50:09.237725498 -0400
> > > @@ -1571,7 +1571,8 @@
> > >         dprintk("RPC:       %s client %p...\n"
> > >                         "RPC:       error %d\n",
> > >                         __func__, xprt, sk->sk_err);
> > > -       xprt_wake_pending_tasks(xprt, -EAGAIN);
> > > +       if (!xprt_connecting(xprt))
> > > +               xprt_wake_pending_tasks(xprt, -EAGAIN);
> > >  out:
> > >         read_unlock_bh(&sk->sk_callback_lock);
> > >  }
> > > 
> > > Unfortunately it does not resolve it.  It appears that does stop the wakeup
> > > from happening where I detailed it before, but we still wake up too early.  The
> > > following is the call stack from SystemTap catching the call to `rpc_make_runnable':
> > > 
> > >  0xffffffffa0434840 : rpc_make_runnable+0x0/0x80 [sunrpc]
> > >  0xffffffffa0434b5e : rpc_wake_up_task_queue_locked+0x18e/0x270 [sunrpc]
> > >  0xffffffffa0434c8e : rpc_wake_up_status+0x4e/0x80 [sunrpc]
> > >  0xffffffffa042d8ec : xprt_wake_pending_tasks+0x2c/0x30 [sunrpc]
> > >  0xffffffffa042da56 : xprt_disconnect_done+0x46/0x70 [sunrpc]
> > >  0xffffffffa04305b0 : xs_tcp_setup_socket+0x330/0x4c0 [sunrpc]
> > >  0xffffffff8108c760 : worker_thread+0x170/0x2a0 [kernel]
> > >  0xffffffff81091d66 : kthread+0x96/0xa0 [kernel]
> > >  0xffffffff8100c14a : child_rip+0xa/0x20 [kernel]
> > > 
> > > However, its mildly confusing to me since I cannot follow that exact
> > > call stack in the code.  The nearest I can find to that is the
> > > following:
> > > 
> > >   rpc_make_runnable
> > >   rpc_wake_up_task_queue_locked
> > >   rpc_wake_up_status
> > >   xprt_wake_pending_tasks
> > >   xprt_disconnect_done
> > >   xs_sock_mark_closed
> > >   xs_abort_connection
> > >   xs_tcp_reuse_connection
> > >   xs_tcp_setup_socket
> > > 
> > > As far as I can tell, we get here because the call to `kernel_connect'
> > > from within `xs_abort_connection' by rpciod is returning 0 sometimes
> > > (and EINPROGRESS others).  When it returns 0, we enter this block:
> > > 
> > >   if (!result)
> > >           xs_sock_mark_closed(&transport->xprt);
> > >   else
> > >     dprintk("RPC:       AF_UNSPEC connect return code %d\n", result);
> > > 
> > > and that eventually gets us woken up via the call stack above.  The end
> > > result is the same, we wake up and move on to `call_transmit' before
> > > rpciod is done connecting the socket.
> > > 
> > > I'll do some more tracing and see what else I can come up with.
> > > 
> > > Chris
> > 
> > OK, then let's just get rid of xs_error_report(). All the cases that we
> > care about will call tcp_done(), which will induce a state change when
> > needed.
> > 
> > Cheers,
> >   Trond
> > 8<---------------------------------------------------------------------
> > From 047ee9ea072698f1d35573c1f26dc02d13a4af7f Mon Sep 17 00:00:00 2001
> > From: Trond Myklebust <Trond.Myklebust@netapp.com>
> > Date: Mon, 22 Oct 2012 17:14:36 -0400
> > Subject: [PATCH] SUNRPC: Get rid of the xs_error_report socket callback
> > 
> > Chris Perl reports that we're seeing races between the wakeup call in
> > xs_error_report and the connect attempts. Basically, Chris has shown
> > that in certain circumstances, the call to xs_error_report causes the
> > rpc_task that is responsible for reconnecting to wake up early, thus
> > triggering a disconnect and retry.
> > 
> > Since the sk->sk_error_report() calls in the socket layer are always
> > followed by a tcp_done() in the cases where we care about waking up
> > the rpc_tasks, just let the state_change callbacks take responsibility
> > for those wake ups.
> > 
> > Reported-by: Chris Perl <chris.perl@gmail.com>
> > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> > ---
> >  net/sunrpc/xprtsock.c | 25 -------------------------
> >  1 file changed, 25 deletions(-)
> > 
> > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
> > index aaaadfb..89bde21 100644
> > --- a/net/sunrpc/xprtsock.c
> > +++ b/net/sunrpc/xprtsock.c
> > @@ -254,7 +254,6 @@ struct sock_xprt {
> >  	void			(*old_data_ready)(struct sock *, int);
> >  	void			(*old_state_change)(struct sock *);
> >  	void			(*old_write_space)(struct sock *);
> > -	void			(*old_error_report)(struct sock *);
> >  };
> >  
> >  /*
> > @@ -781,7 +780,6 @@ static void xs_save_old_callbacks(struct sock_xprt *transport, struct sock *sk)
> >  	transport->old_data_ready = sk->sk_data_ready;
> >  	transport->old_state_change = sk->sk_state_change;
> >  	transport->old_write_space = sk->sk_write_space;
> > -	transport->old_error_report = sk->sk_error_report;
> >  }
> >  
> >  static void xs_restore_old_callbacks(struct sock_xprt *transport, struct sock *sk)
> > @@ -789,7 +787,6 @@ static void xs_restore_old_callbacks(struct sock_xprt *transport, struct sock *s
> >  	sk->sk_data_ready = transport->old_data_ready;
> >  	sk->sk_state_change = transport->old_state_change;
> >  	sk->sk_write_space = transport->old_write_space;
> > -	sk->sk_error_report = transport->old_error_report;
> >  }
> >  
> >  static void xs_reset_transport(struct sock_xprt *transport)
> > @@ -1540,25 +1537,6 @@ static void xs_tcp_state_change(struct sock *sk)
> >  	read_unlock_bh(&sk->sk_callback_lock);
> >  }
> >  
> > -/**
> > - * xs_error_report - callback mainly for catching socket errors
> > - * @sk: socket
> > - */
> > -static void xs_error_report(struct sock *sk)
> > -{
> > -	struct rpc_xprt *xprt;
> > -
> > -	read_lock_bh(&sk->sk_callback_lock);
> > -	if (!(xprt = xprt_from_sock(sk)))
> > -		goto out;
> > -	dprintk("RPC:       %s client %p...\n"
> > -			"RPC:       error %d\n",
> > -			__func__, xprt, sk->sk_err);
> > -	xprt_wake_pending_tasks(xprt, -EAGAIN);
> > -out:
> > -	read_unlock_bh(&sk->sk_callback_lock);
> > -}
> > -
> >  static void xs_write_space(struct sock *sk)
> >  {
> >  	struct socket *sock;
> > @@ -1858,7 +1836,6 @@ static int xs_local_finish_connecting(struct rpc_xprt *xprt,
> >  		sk->sk_user_data = xprt;
> >  		sk->sk_data_ready = xs_local_data_ready;
> >  		sk->sk_write_space = xs_udp_write_space;
> > -		sk->sk_error_report = xs_error_report;
> >  		sk->sk_allocation = GFP_ATOMIC;
> >  
> >  		xprt_clear_connected(xprt);
> > @@ -1983,7 +1960,6 @@ static void xs_udp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> >  		sk->sk_user_data = xprt;
> >  		sk->sk_data_ready = xs_udp_data_ready;
> >  		sk->sk_write_space = xs_udp_write_space;
> > -		sk->sk_error_report = xs_error_report;
> >  		sk->sk_no_check = UDP_CSUM_NORCV;
> >  		sk->sk_allocation = GFP_ATOMIC;
> >  
> > @@ -2098,7 +2074,6 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock)
> >  		sk->sk_data_ready = xs_tcp_data_ready;
> >  		sk->sk_state_change = xs_tcp_state_change;
> >  		sk->sk_write_space = xs_tcp_write_space;
> > -		sk->sk_error_report = xs_error_report;
> >  		sk->sk_allocation = GFP_ATOMIC;
> >  
> >  		/* socket options */
> 
> Hi Trond,
> 
> First, thanks for being so responsive!  I never expected to get such
> quick feedback and help.
> 
> I think perhaps you misread my last email.  Adding the check for
> `xprt_connecting(xprt)' to `xs_error_report' _did_ stop the early wake
> up from happening at that point in the code.  Removing the
> entire call back with your patch works just as well.

OK. I did misunderstand that.

> I have confirmed that I can still reproduce the problem with this
> patch.
> 
> Now that we're not waking up early because of `xs_error_report', we're
> waking up early in another part of the code as I mentioned above.  I'm
> not entirely clear about all the details here yet.
> 
> What I do know is that we're woken up with `tk_status' set to EGAIN, and
> since `call_connect_status' doesn't do anything special for that, we move
> on to `call_transmit'.  Meanwhile, rpciod is still trying to connect the
> socket (eventually succeeding) and so we're racing.
> 
> Question: should `call_connect_status' treat EAGAIN specially and reset
> `tk_action' to `call_connect' again (rather than just moving on to
> `call_transmit')?  I don't think this alone would fix things properly,
> but seems logical to me.  Perhaps I'm missing something.

call_transmit() should normally just cause the rpc_task to detect that
the connection is down, and should cause it to go to call_connect in
order to fix everything.

If it is causing a race, then I'm thinking that is probably due to the
call to xs_tcp_shutdown() when we get an EPIPE error in
xs_tcp_send_request(). That shutdown seems to be needed only in the case
when we're in TCP_CLOSE_WAIT, in which case what we really want is to
force the rpc_task to go to xprt_connect, which means clearing
xprt_connected().

Hmm.... Can you try applying the attached 2 patches?

Cheers
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@netapp.com
www.netapp.com

From ed42390cf42a0ea483d7a8d7e2e2b397ec0a8f00 Mon Sep 17 00:00:00 2001
From: Trond Myklebust <Trond.Myklebust@netapp.com>
Date: Tue, 23 Oct 2012 11:40:02 -0400
Subject: [PATCH 2/2] Revert "SUNRPC: Ensure we close the socket on EPIPE
 errors too..."

This reverts commit 55420c24a0d4d1fce70ca713f84aa00b6b74a70e.
Now that we clear the connected flag when entering TCP_CLOSE_WAIT,
the deadlock described in this commit is no longer possible.
Instead, the resulting call to xs_tcp_shutdown() can interfere
with pending reconnection attempts.

Reported-by: Chris Perl <chris.perl@gmail.com>
Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
Cc: stable@vger.kernel.org
---
 net/sunrpc/xprtsock.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c
index 6e6967d..7e2dd0d 100644
--- a/net/sunrpc/xprtsock.c
+++ b/net/sunrpc/xprtsock.c
@@ -737,10 +737,10 @@ static int xs_tcp_send_request(struct rpc_task *task)
 		dprintk("RPC:       sendmsg returned unrecognized error %d\n",
 			-status);
 	case -ECONNRESET:
-	case -EPIPE:
 		xs_tcp_shutdown(xprt);
 	case -ECONNREFUSED:
 	case -ENOTCONN:
+	case -EPIPE:
 		clear_bit(SOCK_ASYNC_NOSPACE, &transport->sock->flags);
 	}
 
-- 
1.7.11.7