From patchwork Fri Sep 11 16:49:38 2015 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Russell King - ARM Linux X-Patchwork-Id: 7162601 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-parsemail@patchwork1.web.kernel.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.136]) by patchwork1.web.kernel.org (Postfix) with ESMTP id B8B449F380 for ; Fri, 11 Sep 2015 16:51:56 +0000 (UTC) Received: from mail.kernel.org (localhost [127.0.0.1]) by mail.kernel.org (Postfix) with ESMTP id 844032078C for ; Fri, 11 Sep 2015 16:51:55 +0000 (UTC) Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.9]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 320412078B for ; Fri, 11 Sep 2015 16:51:54 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=bombadil.infradead.org) by bombadil.infradead.org with esmtp (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZaRWf-0007XZ-9W; Fri, 11 Sep 2015 16:50:13 +0000 Received: from pandora.arm.linux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:1be6]) by bombadil.infradead.org with esmtps (Exim 4.80.1 #2 (Red Hat Linux)) id 1ZaRWb-000620-JP for linux-arm-kernel@lists.infradead.org; Fri, 11 Sep 2015 16:50:11 +0000 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=arm.linux.org.uk; s=pandora-2014; h=Sender:In-Reply-To:Content-Transfer-Encoding:Content-Type:MIME-Version:References:Message-ID:Subject:Cc:To:From:Date; bh=zV8gIK6VT+q1KnB8pvVMr/Sv2onngCM2gHr3CUqxgPI=; b=TCaP9FNgoAvIlqXo/VODEzZohVSnAQJFlysv1r0u/RacM65CL2rUCF/Fyo/zCzSJ7P/ApbCc2pBfkvBji2C2b1ByYycQDfkAklgHebGf0oUl+IOETYa7tGbJ+1A12Wod/zK/9OXCIPhAtnFbk1KuUj9VKTzP/vZU1uuN/U2f1No=; Received: from n2100.arm.linux.org.uk ([2001:4d48:ad52:3201:214:fdff:fe10:4f86]:42731) by pandora.arm.linux.org.uk with esmtpsa (TLSv1:DHE-RSA-AES256-SHA:256) (Exim 4.82_1-5b7a7c0-XX) (envelope-from ) id 1ZaRW9-0004gw-VH; Fri, 11 Sep 2015 17:49:42 +0100 Received: from linux by n2100.arm.linux.org.uk with local (Exim 4.76) (envelope-from ) id 1ZaRW6-00051y-Ot; Fri, 11 Sep 2015 17:49:38 +0100 Date: Fri, 11 Sep 2015 17:49:38 +0100 From: Russell King - ARM Linux To: Eric Dumazet Subject: Re: NFS/TCP/IPv6 acting strangely in 4.2 Message-ID: <20150911164937.GW21084@n2100.arm.linux.org.uk> References: <20150911113839.GO21084@n2100.arm.linux.org.uk> <1441976691.4619.58.camel@edumazet-glaptop2.roam.corp.google.com> <20150911143347.GQ21084@n2100.arm.linux.org.uk> <20150911150613.GR21084@n2100.arm.linux.org.uk> <1441984723.4619.61.camel@edumazet-glaptop2.roam.corp.google.com> <20150911162416.GV21084@n2100.arm.linux.org.uk> MIME-Version: 1.0 Content-Disposition: inline In-Reply-To: <20150911162416.GV21084@n2100.arm.linux.org.uk> User-Agent: Mutt/1.5.23 (2014-03-12) X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20150911_095010_091020_AED54095 X-CRM114-Status: GOOD ( 36.45 ) X-Spam-Score: -2.0 (--) X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: netdev@vger.kernel.org, linux-nfs@vger.kernel.org, Trond Myklebust , Anna Schumaker , linux-arm-kernel@lists.infradead.org Sender: "linux-arm-kernel" Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org X-Spam-Status: No, score=-4.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED, T_DKIM_INVALID, T_RP_MATCHES_RCVD, UNPARSEABLE_RELAY autolearn=ham version=3.3.1 X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on mail.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP On Fri, Sep 11, 2015 at 05:24:17PM +0100, Russell King - ARM Linux wrote: > On Fri, Sep 11, 2015 at 08:18:43AM -0700, Eric Dumazet wrote: > > On Fri, 2015-09-11 at 16:06 +0100, Russell King - ARM Linux wrote: > > > On Fri, Sep 11, 2015 at 03:33:47PM +0100, Russell King - ARM Linux wrote: > > > > It looks like 0c78789e3a030615c6650fde89546cadf40ec2cc might be relevant > > > > too, but I don't see that solving the multiple _concurrent_ connection > > > > attempts with the same port number - presumably it's somehow trying to > > > > make the same socket repeatedly connect despite a previous connection > > > > being in progress, which would have nothing to do with cleaning up a > > > > previous attempt. > > > > > > As I suspected, applying the above commit in addition does not solve the > > > problem, I still see the same behaviour: SYN SYNACK SYN RSTACK, SYN > > > SYNACK SYN RSTACK, and eventual SYN storms. > > > > > > I do have this captured as well: > > > > > > 2558 0.834316 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd?nfs [SYN] Seq=1053655487 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60001 TSecr=0 WS=128 > > > 2559 0.834572 n2100 -> armada388 TCP nfs?rpasswd [SYN, ACK] Seq=3076611574 Ack=1053655488 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=869622246 TSecr=60001 WS=64 > > > 2560 0.834666 armada388 -> n2100 TCP [TCP Port numbers reused] rpasswd?nfs [SYN] Seq=1054228544 Win=28800 Len=0 MSS=1440 SACK_PERM=1 TSval=60005 TSecr=0 WS=128 > > > 2561 0.834895 n2100 -> armada388 TCP nfs?rpasswd [ACK] Seq=3076611575 Ack=1053655488 Win=28560 Len=0 TSval=869622246 TSecr=60001 > > > > > > The packet at 2561 looks wrong to me - this doesn't follow what I know > > > would be the standard TCP setup of syn, synack, ack, because that final > > > ack is in the wrong direction. > > > > > > > This 2561 packet is an ACK packet, because n2100 has a SYN_RECV socket > > created by packet 2558. > > > > It receives a SYN packet (2560) that it interprets as a packet slightly > > out of sequence (1054228544 being above 1053655487) for this SYN_RECV > > > > The wrong packet is 2560, not 2561 > > Ok. > > Looking deeper at the XPRT sunrpc code, I have to wonder about the > sanity of this: > > void xprt_connect(struct rpc_task *task) > { > ... > if (!xprt_connected(xprt)) { > ... > if (test_bit(XPRT_CLOSING, &xprt->state)) > return; > if (xprt_test_and_set_connecting(xprt)) > return; > xprt->stat.connect_start = jiffies; > xprt->ops->connect(xprt, task); > > That calls into xs_connect(), which schedules a workqueue to do the > connection. The workqueue will call xs_tcp_setup_socket(). > > xs_tcp_setup_socket() creates a socket if one didn't exist, otherwise > re-using the previously obtained socket (which'll be why its using the > same socket) and then goes on to call xs_tcp_finish_connecting(). > > xs_tcp_finish_connecting() calls kernel_connect(), which will return > -EINPROGRESS. We seem to treat EINPROGRESS as if the connection was > successful: > > case 0: > case -EINPROGRESS: > case -EALREADY: > xprt_unlock_connect(xprt, transport); > xprt_clear_connecting(xprt); > return; > > and the xprt_clear_connecting() results in this whole path being > re-opened: the socket is not yet connected, so xprt_connected() will > return false, and despite the socket connection still being mid-way > through being connected, we clear the "connecting" status, causing > xprt_test_and_set_connecting() to return false. > > That allows us to re-call xprt->ops->connect, re-queue the connect > worker, and re-run the call to kernel_connect() for a socket which is > already mid-way through being connected. > > Shouldn't the "connecting" status only be cleared when either the socket > has _finished_ connecting, or when the connection has _failed_ to connect, > and not when it's mid-way through connecting? > > I've not been able to prove this: I've set rpc_debug to 129 to log > just xprt and trans RPC facilities, and that's sufficient to change > the timing such that this doesn't happen. Following that idea, I just tried the patch below, and it seems to work. I don't know whether it handles all cases after a call to kernel_connect(), but it stops the multiple connection attempts: 1 0.000000 armada388 -> n2100 TCP 1009?nfs [SYN] Seq=3794066539 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 WS=128 2 0.000414 n2100 -> armada388 TCP nfs?1009 [SYN, ACK] Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=870318939 TSecr=15712 WS=64 3 0.000787 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066540 Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL] 5 0.001566 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0x905379cc, [Check: RD LU MD XT DL] 7 0.001866 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), [Allowed: RD LU MD XT DL] 9 0.003415 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066780 Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL] 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), [Allowed: RD LU MD XT DL] 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: 0xe15fc9c9, [Check: RD LU MD XT DL] 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), [Allowed: RD LU MD XT DL] 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: 0xe15fc9c9 ... net/sunrpc/xprtsock.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index ff5b6a2e62c3..c456d6e51c56 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1450,6 +1450,7 @@ static void xs_tcp_state_change(struct sock *sk) switch (sk->sk_state) { case TCP_ESTABLISHED: spin_lock(&xprt->transport_lock); + xprt_clear_connecting(xprt); if (!xprt_test_and_set_connected(xprt)) { struct sock_xprt *transport = container_of(xprt, struct sock_xprt, xprt); @@ -1474,12 +1475,14 @@ static void xs_tcp_state_change(struct sock *sk) smp_mb__before_atomic(); clear_bit(XPRT_CONNECTED, &xprt->state); clear_bit(XPRT_CLOSE_WAIT, &xprt->state); + clear_bit(XPRT_CONNECTING, &xprt->state); smp_mb__after_atomic(); break; case TCP_CLOSE_WAIT: /* The server initiated a shutdown of the socket */ xprt->connect_cookie++; clear_bit(XPRT_CONNECTED, &xprt->state); + clear_bit(XPRT_CONNECTING, &xprt->state); xs_tcp_force_close(xprt); case TCP_CLOSING: /* @@ -1493,6 +1496,7 @@ static void xs_tcp_state_change(struct sock *sk) set_bit(XPRT_CLOSING, &xprt->state); smp_mb__before_atomic(); clear_bit(XPRT_CONNECTED, &xprt->state); + clear_bit(XPRT_CONNECTING, &xprt->state); smp_mb__after_atomic(); break; case TCP_CLOSE: @@ -2237,11 +2241,13 @@ static void xs_tcp_setup_socket(struct work_struct *work) xs_tcp_force_close(xprt); break; case 0: - case -EINPROGRESS: case -EALREADY: xprt_unlock_connect(xprt, transport); xprt_clear_connecting(xprt); return; + case -EINPROGRESS: + xprt_unlock_connect(xprt, transport); + return; case -EINVAL: /* Happens, for instance, if the user specified a link * local IPv6 address without a scope-id.