Message ID | 1442499509.2865.16.camel@primarydata.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, 17 Sep 2015, Trond Myklebust wrote: > Hi Russell, > > On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: > > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux > > wrote: > > > Following that idea, I just tried the patch below, and it seems to > > > work. > > > I don't know whether it handles all cases after a call to > > > kernel_connect(), > > > but it stops the multiple connection attempts: > > > > > > 1 0.000000 armada388 -> n2100 TCP 1009?nfs [SYN] Seq=3794066539 > > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 > > > WS=128 > > > 2 0.000414 n2100 -> armada388 TCP nfs?1009 [SYN, ACK] > > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 > > > TSval=870318939 TSecr=15712 WS=64 > > > 3 0.000787 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066540 > > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 5 0.001566 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 > > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 7 0.001866 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 > > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), > > > [Allowed: RD LU MD XT DL] > > > 9 0.003415 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066780 > > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), > > > [Allowed: RD LU MD XT DL] > > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), > > > [Allowed: RD LU MD XT DL] > > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: > > > 0xe15fc9c9 > > > ... > > > > NFS people - any comments on this patch? Is it the correct way to > > solve > > this problem (please see the first message in this thread for the > > problem.) > > Without this patch, NFS is unusable as it tries to launch multiple > > new > > connections from the same port to the NFS server without giving the > > NFS > > server time to respond and establish the TCP connection. > > I agree that it addresses a real problem here, however there are a > couple of issues with the patch itself: > > AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and > TCP_CLOSE, so if the connection attempt fails, this patch leaves the > XPRT_CONNECTING flag set. > There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, > TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection > attempt by canceling the XPRT_CONNECTING state. > > How about the following? It is based on your patch, but adds a check to > ensure that xs_tcp_state_change() doesn't clear the 'connecting' state > more than once (which could otherwise still happen in the TCP_CLOSE > case). > > 8<------------------------------------------------------------------- > From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Wed, 16 Sep 2015 23:43:17 -0400 > Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete > before retrying > > Commit 718ba5b87343, moved the responsibility for unlocking the socket to > xs_tcp_setup_socket, meaning that the socket will be unlocked before we > know that it has finished trying to connect. The following patch is based on > an initial patch by Russell King to ensure that we delay clearing the > XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate > a connection attempt, or the connection attempt itself failed. > > Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") > Reported-by: Russell King <linux@arm.linux.org.uk> > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> This fixes up my network segmentation problem, tested on top of your "Fix races between socket connection and destroy code". Tested-by: Benjamin Coddington <bcodding@redhat.com> Ben > --- > include/linux/sunrpc/xprtsock.h | 3 +++ > net/sunrpc/xprtsock.c | 11 ++++++++--- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h > index 7591788e9fbf..357e44c1a46b 100644 > --- a/include/linux/sunrpc/xprtsock.h > +++ b/include/linux/sunrpc/xprtsock.h > @@ -42,6 +42,7 @@ struct sock_xprt { > /* > * Connection of transports > */ > + unsigned long sock_state; > struct delayed_work connect_worker; > struct sockaddr_storage srcaddr; > unsigned short srcport; > @@ -76,6 +77,8 @@ struct sock_xprt { > */ > #define TCP_RPC_REPLY (1UL << 6) > > +#define XPRT_SOCK_CONNECTING 1U > + > #endif /* __KERNEL__ */ > > #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 7be90bc1a7c2..5bac27983e2a 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1435,6 +1435,7 @@ out: > static void xs_tcp_state_change(struct sock *sk) > { > struct rpc_xprt *xprt; > + struct sock_xprt *transport; > > read_lock_bh(&sk->sk_callback_lock); > if (!(xprt = xprt_from_sock(sk))) > @@ -1446,13 +1447,12 @@ static void xs_tcp_state_change(struct sock *sk) > sock_flag(sk, SOCK_ZAPPED), > sk->sk_shutdown); > > + transport = container_of(xprt, struct sock_xprt, xprt); > trace_rpc_socket_state_change(xprt, sk->sk_socket); > switch (sk->sk_state) { > case TCP_ESTABLISHED: > spin_lock(&xprt->transport_lock); > if (!xprt_test_and_set_connected(xprt)) { > - struct sock_xprt *transport = container_of(xprt, > - struct sock_xprt, xprt); > > /* Reset TCP record info */ > transport->tcp_offset = 0; > @@ -1461,6 +1461,8 @@ static void xs_tcp_state_change(struct sock *sk) > transport->tcp_flags = > TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; > xprt->connect_cookie++; > + clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); > + xprt_clear_connecting(xprt); > > xprt_wake_pending_tasks(xprt, -EAGAIN); > } > @@ -1496,6 +1498,9 @@ static void xs_tcp_state_change(struct sock *sk) > smp_mb__after_atomic(); > break; > case TCP_CLOSE: > + if (test_and_clear_bit(XPRT_SOCK_CONNECTING, > + &transport->sock_state)) > + xprt_clear_connecting(xprt); > xs_sock_mark_closed(xprt); > } > out: > @@ -2179,6 +2184,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > /* Tell the socket layer to start connecting... */ > xprt->stat.connect_count++; > xprt->stat.connect_start = jiffies; > + set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); > ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); > switch (ret) { > case 0: > @@ -2240,7 +2246,6 @@ static void xs_tcp_setup_socket(struct work_struct *work) > case -EINPROGRESS: > case -EALREADY: > xprt_unlock_connect(xprt, transport); > - xprt_clear_connecting(xprt); > return; > case -EINVAL: > /* Happens, for instance, if the user specified a link > -- > 2.4.3 > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-nfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
On Thu, Sep 17, 2015 at 10:18:29AM -0400, Trond Myklebust wrote: > Hi Russell, > > On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: > > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux > > wrote: > > > Following that idea, I just tried the patch below, and it seems to > > > work. > > > I don't know whether it handles all cases after a call to > > > kernel_connect(), > > > but it stops the multiple connection attempts: > > > > > > 1 0.000000 armada388 -> n2100 TCP 1009?nfs [SYN] Seq=3794066539 > > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 > > > WS=128 > > > 2 0.000414 n2100 -> armada388 TCP nfs?1009 [SYN, ACK] > > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 > > > TSval=870318939 TSecr=15712 WS=64 > > > 3 0.000787 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066540 > > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 5 0.001566 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 > > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0x905379cc, [Check: RD LU MD XT DL] > > > 7 0.001866 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 > > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 > > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), > > > [Allowed: RD LU MD XT DL] > > > 9 0.003415 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066780 > > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 > > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), > > > [Allowed: RD LU MD XT DL] > > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: > > > 0xe15fc9c9, [Check: RD LU MD XT DL] > > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), > > > [Allowed: RD LU MD XT DL] > > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: > > > 0xe15fc9c9 > > > ... > > > > NFS people - any comments on this patch? Is it the correct way to > > solve > > this problem (please see the first message in this thread for the > > problem.) > > Without this patch, NFS is unusable as it tries to launch multiple > > new > > connections from the same port to the NFS server without giving the > > NFS > > server time to respond and establish the TCP connection. > > I agree that it addresses a real problem here, however there are a > couple of issues with the patch itself: > > AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and > TCP_CLOSE, so if the connection attempt fails, this patch leaves the > XPRT_CONNECTING flag set. > There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, > TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection > attempt by canceling the XPRT_CONNECTING state. > > How about the following? It is based on your patch, but adds a check to > ensure that xs_tcp_state_change() doesn't clear the 'connecting' state > more than once (which could otherwise still happen in the TCP_CLOSE > case). This patch also seems to fix the problem I've been seeing. Yes, I wasn't sure about my patch - I didn't spend much time properly reading and understanding the sunrpc code, beyond analysing what was going on to cause the problem and deciding on a way to stop it happening. I really wasn't sure that clearing the connecting flag everywhere I did was the right thing, which is why I didn't send the patch properly dressed up. > 8<------------------------------------------------------------------- > >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 > From: Trond Myklebust <trond.myklebust@primarydata.com> > Date: Wed, 16 Sep 2015 23:43:17 -0400 > Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete > before retrying > > Commit 718ba5b87343, moved the responsibility for unlocking the socket to > xs_tcp_setup_socket, meaning that the socket will be unlocked before we > know that it has finished trying to connect. The following patch is based on > an initial patch by Russell King to ensure that we delay clearing the > XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate > a connection attempt, or the connection attempt itself failed. > > Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") > Reported-by: Russell King <linux@arm.linux.org.uk> Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> Thanks. > Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > --- > include/linux/sunrpc/xprtsock.h | 3 +++ > net/sunrpc/xprtsock.c | 11 ++++++++--- > 2 files changed, 11 insertions(+), 3 deletions(-) > > diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h > index 7591788e9fbf..357e44c1a46b 100644 > --- a/include/linux/sunrpc/xprtsock.h > +++ b/include/linux/sunrpc/xprtsock.h > @@ -42,6 +42,7 @@ struct sock_xprt { > /* > * Connection of transports > */ > + unsigned long sock_state; > struct delayed_work connect_worker; > struct sockaddr_storage srcaddr; > unsigned short srcport; > @@ -76,6 +77,8 @@ struct sock_xprt { > */ > #define TCP_RPC_REPLY (1UL << 6) > > +#define XPRT_SOCK_CONNECTING 1U > + > #endif /* __KERNEL__ */ > > #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 7be90bc1a7c2..5bac27983e2a 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1435,6 +1435,7 @@ out: > static void xs_tcp_state_change(struct sock *sk) > { > struct rpc_xprt *xprt; > + struct sock_xprt *transport; > > read_lock_bh(&sk->sk_callback_lock); > if (!(xprt = xprt_from_sock(sk))) > @@ -1446,13 +1447,12 @@ static void xs_tcp_state_change(struct sock *sk) > sock_flag(sk, SOCK_ZAPPED), > sk->sk_shutdown); > > + transport = container_of(xprt, struct sock_xprt, xprt); > trace_rpc_socket_state_change(xprt, sk->sk_socket); > switch (sk->sk_state) { > case TCP_ESTABLISHED: > spin_lock(&xprt->transport_lock); > if (!xprt_test_and_set_connected(xprt)) { > - struct sock_xprt *transport = container_of(xprt, > - struct sock_xprt, xprt); > > /* Reset TCP record info */ > transport->tcp_offset = 0; > @@ -1461,6 +1461,8 @@ static void xs_tcp_state_change(struct sock *sk) > transport->tcp_flags = > TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; > xprt->connect_cookie++; > + clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); > + xprt_clear_connecting(xprt); > > xprt_wake_pending_tasks(xprt, -EAGAIN); > } > @@ -1496,6 +1498,9 @@ static void xs_tcp_state_change(struct sock *sk) > smp_mb__after_atomic(); > break; > case TCP_CLOSE: > + if (test_and_clear_bit(XPRT_SOCK_CONNECTING, > + &transport->sock_state)) > + xprt_clear_connecting(xprt); > xs_sock_mark_closed(xprt); > } > out: > @@ -2179,6 +2184,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) > /* Tell the socket layer to start connecting... */ > xprt->stat.connect_count++; > xprt->stat.connect_start = jiffies; > + set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); > ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); > switch (ret) { > case 0: > @@ -2240,7 +2246,6 @@ static void xs_tcp_setup_socket(struct work_struct *work) > case -EINPROGRESS: > case -EALREADY: > xprt_unlock_connect(xprt, transport); > - xprt_clear_connecting(xprt); > return; > case -EINVAL: > /* Happens, for instance, if the user specified a link > -- > 2.4.3 > > -- > Trond Myklebust > Linux NFS client maintainer, PrimaryData > trond.myklebust@primarydata.com > > >
On Thu, Sep 17, 2015 at 5:47 PM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Sep 17, 2015 at 10:18:29AM -0400, Trond Myklebust wrote: >> Hi Russell, >> >> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: >> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux >> > wrote: >> > > Following that idea, I just tried the patch below, and it seems to >> > > work. >> > > I don't know whether it handles all cases after a call to >> > > kernel_connect(), >> > > but it stops the multiple connection attempts: >> > > >> > > 1 0.000000 armada388 -> n2100 TCP 1009?nfs [SYN] Seq=3794066539 >> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 >> > > WS=128 >> > > 2 0.000414 n2100 -> armada388 TCP nfs?1009 [SYN, ACK] >> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 >> > > TSval=870318939 TSecr=15712 WS=64 >> > > 3 0.000787 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066540 >> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 5 0.001566 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 >> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 7 0.001866 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 >> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), >> > > [Allowed: RD LU MD XT DL] >> > > 9 0.003415 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066780 >> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), >> > > [Allowed: RD LU MD XT DL] >> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), >> > > [Allowed: RD LU MD XT DL] >> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: >> > > 0xe15fc9c9 >> > > ... >> > >> > NFS people - any comments on this patch? Is it the correct way to >> > solve >> > this problem (please see the first message in this thread for the >> > problem.) >> > Without this patch, NFS is unusable as it tries to launch multiple >> > new >> > connections from the same port to the NFS server without giving the >> > NFS >> > server time to respond and establish the TCP connection. >> >> I agree that it addresses a real problem here, however there are a >> couple of issues with the patch itself: >> >> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and >> TCP_CLOSE, so if the connection attempt fails, this patch leaves the >> XPRT_CONNECTING flag set. >> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, >> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection >> attempt by canceling the XPRT_CONNECTING state. >> >> How about the following? It is based on your patch, but adds a check to >> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state >> more than once (which could otherwise still happen in the TCP_CLOSE >> case). > > This patch also seems to fix the problem I've been seeing. > > Yes, I wasn't sure about my patch - I didn't spend much time properly > reading and understanding the sunrpc code, beyond analysing what was > going on to cause the problem and deciding on a way to stop it happening. > I really wasn't sure that clearing the connecting flag everywhere I did > was the right thing, which is why I didn't send the patch properly > dressed up. > >> 8<------------------------------------------------------------------- >> >From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Date: Wed, 16 Sep 2015 23:43:17 -0400 >> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete >> before retrying >> >> Commit 718ba5b87343, moved the responsibility for unlocking the socket to >> xs_tcp_setup_socket, meaning that the socket will be unlocked before we >> know that it has finished trying to connect. The following patch is based on >> an initial patch by Russell King to ensure that we delay clearing the >> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate >> a connection attempt, or the connection attempt itself failed. >> >> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") >> Reported-by: Russell King <linux@arm.linux.org.uk> > > Reported-by: Russell King <rmk+kernel@arm.linux.org.uk> > Tested-by: Russell King <rmk+kernel@arm.linux.org.uk> > Thanks Russell!
On Thu, Sep 17, 2015 at 12:27 PM, Benjamin Coddington <bcodding@redhat.com> wrote: > On Thu, 17 Sep 2015, Trond Myklebust wrote: > >> Hi Russell, >> >> On Thu, 2015-09-17 at 14:57 +0100, Russell King - ARM Linux wrote: >> > On Fri, Sep 11, 2015 at 05:49:38PM +0100, Russell King - ARM Linux >> > wrote: >> > > Following that idea, I just tried the patch below, and it seems to >> > > work. >> > > I don't know whether it handles all cases after a call to >> > > kernel_connect(), >> > > but it stops the multiple connection attempts: >> > > >> > > 1 0.000000 armada388 -> n2100 TCP 1009?nfs [SYN] Seq=3794066539 >> > > Win=28560 Len=0 MSS=1440 SACK_PERM=1 TSval=15712 TSecr=870317691 >> > > WS=128 >> > > 2 0.000414 n2100 -> armada388 TCP nfs?1009 [SYN, ACK] >> > > Seq=1884476522 Ack=3794066540 Win=28560 Len=0 MSS=1440 SACK_PERM=1 >> > > TSval=870318939 TSecr=15712 WS=64 >> > > 3 0.000787 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066540 >> > > Ack=1884476523 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 4 0.001304 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 5 0.001566 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 >> > > Ack=3794066660 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 6 0.001640 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0x905379cc, [Check: RD LU MD XT DL] >> > > 7 0.001866 n2100 -> armada388 TCP nfs?1009 [ACK] Seq=1884476523 >> > > Ack=3794066780 Win=28608 Len=0 TSval=870318939 TSecr=15712 >> > > 8 0.003070 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 4), >> > > [Allowed: RD LU MD XT DL] >> > > 9 0.003415 armada388 -> n2100 TCP 1009?nfs [ACK] Seq=3794066780 >> > > Ack=1884476647 Win=28672 Len=0 TSval=15712 TSecr=870318939 >> > > 10 0.003592 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 11 0.004354 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 6), >> > > [Allowed: RD LU MD XT DL] >> > > 12 0.004682 armada388 -> n2100 NFS V3 ACCESS Call, FH: >> > > 0xe15fc9c9, [Check: RD LU MD XT DL] >> > > 13 0.005365 n2100 -> armada388 NFS V3 ACCESS Reply (Call In 10), >> > > [Allowed: RD LU MD XT DL] >> > > 14 0.005701 armada388 -> n2100 NFS V3 GETATTR Call, FH: >> > > 0xe15fc9c9 >> > > ... >> > >> > NFS people - any comments on this patch? Is it the correct way to >> > solve >> > this problem (please see the first message in this thread for the >> > problem.) >> > Without this patch, NFS is unusable as it tries to launch multiple >> > new >> > connections from the same port to the NFS server without giving the >> > NFS >> > server time to respond and establish the TCP connection. >> >> I agree that it addresses a real problem here, however there are a >> couple of issues with the patch itself: >> >> AFAICS, the 2 possible next states for SYN_SENT are TCP_ESTABLISHED and >> TCP_CLOSE, so if the connection attempt fails, this patch leaves the >> XPRT_CONNECTING flag set. >> There is also the issue that clearing XPRT_CONNECTING in TCP_FIN_WAIT1, >> TCP_CLOSE_WAIT and TCP_CLOSING could interfere with another connection >> attempt by canceling the XPRT_CONNECTING state. >> >> How about the following? It is based on your patch, but adds a check to >> ensure that xs_tcp_state_change() doesn't clear the 'connecting' state >> more than once (which could otherwise still happen in the TCP_CLOSE >> case). >> >> 8<------------------------------------------------------------------- >> From 4dbfdebbc09982a9248866f8256549456e2b2efd Mon Sep 17 00:00:00 2001 >> From: Trond Myklebust <trond.myklebust@primarydata.com> >> Date: Wed, 16 Sep 2015 23:43:17 -0400 >> Subject: [PATCH] SUNRPC: Ensure that we wait for connections to complete >> before retrying >> >> Commit 718ba5b87343, moved the responsibility for unlocking the socket to >> xs_tcp_setup_socket, meaning that the socket will be unlocked before we >> know that it has finished trying to connect. The following patch is based on >> an initial patch by Russell King to ensure that we delay clearing the >> XPRT_SOCK_CONNECTING flag until we either know that we failed to initiate >> a connection attempt, or the connection attempt itself failed. >> >> Fixes: 718ba5b87343 ("SUNRPC: Add helpers to prevent socket create from racing") >> Reported-by: Russell King <linux@arm.linux.org.uk> >> Signed-off-by: Trond Myklebust <trond.myklebust@primarydata.com> > > This fixes up my network segmentation problem, tested on top of your "Fix > races between socket connection and destroy code". > > Tested-by: Benjamin Coddington <bcodding@redhat.com> > Thanks Ben!
diff --git a/include/linux/sunrpc/xprtsock.h b/include/linux/sunrpc/xprtsock.h index 7591788e9fbf..357e44c1a46b 100644 --- a/include/linux/sunrpc/xprtsock.h +++ b/include/linux/sunrpc/xprtsock.h @@ -42,6 +42,7 @@ struct sock_xprt { /* * Connection of transports */ + unsigned long sock_state; struct delayed_work connect_worker; struct sockaddr_storage srcaddr; unsigned short srcport; @@ -76,6 +77,8 @@ struct sock_xprt { */ #define TCP_RPC_REPLY (1UL << 6) +#define XPRT_SOCK_CONNECTING 1U + #endif /* __KERNEL__ */ #endif /* _LINUX_SUNRPC_XPRTSOCK_H */ diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 7be90bc1a7c2..5bac27983e2a 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1435,6 +1435,7 @@ out: static void xs_tcp_state_change(struct sock *sk) { struct rpc_xprt *xprt; + struct sock_xprt *transport; read_lock_bh(&sk->sk_callback_lock); if (!(xprt = xprt_from_sock(sk))) @@ -1446,13 +1447,12 @@ static void xs_tcp_state_change(struct sock *sk) sock_flag(sk, SOCK_ZAPPED), sk->sk_shutdown); + transport = container_of(xprt, struct sock_xprt, xprt); trace_rpc_socket_state_change(xprt, sk->sk_socket); switch (sk->sk_state) { case TCP_ESTABLISHED: spin_lock(&xprt->transport_lock); if (!xprt_test_and_set_connected(xprt)) { - struct sock_xprt *transport = container_of(xprt, - struct sock_xprt, xprt); /* Reset TCP record info */ transport->tcp_offset = 0; @@ -1461,6 +1461,8 @@ static void xs_tcp_state_change(struct sock *sk) transport->tcp_flags = TCP_RCV_COPY_FRAGHDR | TCP_RCV_COPY_XID; xprt->connect_cookie++; + clear_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); + xprt_clear_connecting(xprt); xprt_wake_pending_tasks(xprt, -EAGAIN); } @@ -1496,6 +1498,9 @@ static void xs_tcp_state_change(struct sock *sk) smp_mb__after_atomic(); break; case TCP_CLOSE: + if (test_and_clear_bit(XPRT_SOCK_CONNECTING, + &transport->sock_state)) + xprt_clear_connecting(xprt); xs_sock_mark_closed(xprt); } out: @@ -2179,6 +2184,7 @@ static int xs_tcp_finish_connecting(struct rpc_xprt *xprt, struct socket *sock) /* Tell the socket layer to start connecting... */ xprt->stat.connect_count++; xprt->stat.connect_start = jiffies; + set_bit(XPRT_SOCK_CONNECTING, &transport->sock_state); ret = kernel_connect(sock, xs_addr(xprt), xprt->addrlen, O_NONBLOCK); switch (ret) { case 0: @@ -2240,7 +2246,6 @@ static void xs_tcp_setup_socket(struct work_struct *work) case -EINPROGRESS: case -EALREADY: xprt_unlock_connect(xprt, transport); - xprt_clear_connecting(xprt); return; case -EINVAL: /* Happens, for instance, if the user specified a link