Message ID | 20181212135157.4489-2-dwysocha@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | SUNRPC: Ensure XPRT_CONNECTED is cleared while handling TCP RST | expand |
On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED bit > was cleared unconditionally inside xprt_disconnect_done, similar > to the handling of TCP_CLOSE_WAIT. After the change the clearing > of XPRT_CONNECTED depends on successfully queueing a work based > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > This is significant in the case of an unexpected RST from the > server, as the client will only see xs_tcp_state_change called with > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > with handling TCP_CLOSE_WAIT. > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > --- > net/sunrpc/xprtsock.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > index 8a5e823e0b33..b9789036051d 100644 > --- a/net/sunrpc/xprtsock.c > +++ b/net/sunrpc/xprtsock.c > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > *sk) > if (sk->sk_err) > xprt_wake_pending_tasks(xprt, -sk->sk_err); > /* Trigger the socket release */ > + clear_bit(XPRT_CONNECTED, &xprt->state); > xs_tcp_force_close(xprt); > } > out: Hi Dave, This isn't needed for 4.20 or newer because call_transmit() will now always call xprt_end_transmit(). I suggest that a stable fix do something similar (perhaps conditional on the error returned by xprt_transmit()?). Cheers Trond
On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED bit > > was cleared unconditionally inside xprt_disconnect_done, similar > > to the handling of TCP_CLOSE_WAIT. After the change the clearing > > of XPRT_CONNECTED depends on successfully queueing a work based > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > This is significant in the case of an unexpected RST from the > > server, as the client will only see xs_tcp_state_change called with > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > with handling TCP_CLOSE_WAIT. > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > --- > > net/sunrpc/xprtsock.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 8a5e823e0b33..b9789036051d 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > > *sk) > > if (sk->sk_err) > > xprt_wake_pending_tasks(xprt, -sk- > > >sk_err); > > /* Trigger the socket release */ > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > xs_tcp_force_close(xprt); > > } > > out: > > Hi Dave, > > This isn't needed for 4.20 or newer because call_transmit() will now > always call xprt_end_transmit(). I suggest that a stable fix do > something similar (perhaps conditional on the error returned by > xprt_transmit()?). > Can you explain the handling in xs_tcp_state_change - why XPRT_CONNECTED would need to remain set longer in the case of handling TCP_CLOSE case rather than TCP_CLOSE_WAIT? It seems like if we get an RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED bit getting cleared need to be delayed in that case? I will look into the approach you mention though at the moment I do not see how it helps because the underlying issue seems to be clearing of the XPRT_CONNECTED bit. Thanks.
On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote: > On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: > > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED > > > bit > > > was cleared unconditionally inside xprt_disconnect_done, similar > > > to the handling of TCP_CLOSE_WAIT. After the change the clearing > > > of XPRT_CONNECTED depends on successfully queueing a work based > > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > > This is significant in the case of an unexpected RST from the > > > server, as the client will only see xs_tcp_state_change called > > > with > > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > > with handling TCP_CLOSE_WAIT. > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > > --- > > > net/sunrpc/xprtsock.c | 1 + > > > 1 file changed, 1 insertion(+) > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > index 8a5e823e0b33..b9789036051d 100644 > > > --- a/net/sunrpc/xprtsock.c > > > +++ b/net/sunrpc/xprtsock.c > > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > > > *sk) > > > if (sk->sk_err) > > > xprt_wake_pending_tasks(xprt, -sk- > > > > sk_err); > > > /* Trigger the socket release */ > > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > > xs_tcp_force_close(xprt); > > > } > > > out: > > > > Hi Dave, > > > > This isn't needed for 4.20 or newer because call_transmit() will > > now > > always call xprt_end_transmit(). I suggest that a stable fix do > > something similar (perhaps conditional on the error returned by > > xprt_transmit()?). > > > > Can you explain the handling in xs_tcp_state_change - why > XPRT_CONNECTED would need to remain set longer in the case of > handling > TCP_CLOSE case rather than TCP_CLOSE_WAIT? It seems like if we get > an > RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED > bit > getting cleared need to be delayed in that case? > > I will look into the approach you mention though at the moment I do > not > see how it helps because the underlying issue seems to be clearing of > the XPRT_CONNECTED bit. > See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically trigger autoclose if XPRT_CLOSE_WAIT is set, regardless of whether or not there are other tasks waiting for the lock.
On Wed, 2018-12-12 at 18:02 +0000, Trond Myklebust wrote: > On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote: > > On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: > > > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED > > > > bit > > > > was cleared unconditionally inside xprt_disconnect_done, > > > > similar > > > > to the handling of TCP_CLOSE_WAIT. After the change the > > > > clearing > > > > of XPRT_CONNECTED depends on successfully queueing a work based > > > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > > > This is significant in the case of an unexpected RST from the > > > > server, as the client will only see xs_tcp_state_change called > > > > with > > > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > > > with handling TCP_CLOSE_WAIT. > > > > > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > > > --- > > > > net/sunrpc/xprtsock.c | 1 + > > > > 1 file changed, 1 insertion(+) > > > > > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > > > index 8a5e823e0b33..b9789036051d 100644 > > > > --- a/net/sunrpc/xprtsock.c > > > > +++ b/net/sunrpc/xprtsock.c > > > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct > > > > sock > > > > *sk) > > > > if (sk->sk_err) > > > > xprt_wake_pending_tasks(xprt, -sk- > > > > > sk_err); > > > > > > > > /* Trigger the socket release */ > > > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > > > xs_tcp_force_close(xprt); > > > > } > > > > out: > > > > > > Hi Dave, > > > > > > This isn't needed for 4.20 or newer because call_transmit() will > > > now > > > always call xprt_end_transmit(). I suggest that a stable fix do > > > something similar (perhaps conditional on the error returned by > > > xprt_transmit()?). > > > > > > > Can you explain the handling in xs_tcp_state_change - why > > XPRT_CONNECTED would need to remain set longer in the case of > > handling > > TCP_CLOSE case rather than TCP_CLOSE_WAIT? It seems like if we get > > an > > RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED > > bit > > getting cleared need to be delayed in that case? > > > > I will look into the approach you mention though at the moment I do > > not > > see how it helps because the underlying issue seems to be clearing > > of > > the XPRT_CONNECTED bit. > > > > See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically > trigger autoclose if XPRT_CLOSE_WAIT is set, regardless of whether > or > not there are other tasks waiting for the lock. > > Ok thanks for pointing me in that direction I will investigate. I am actually seeing a hang now with the reproducer on 4.20-rc6 but it's not CPU spinning. Investigating as it was not easy to reproduce but there must still be a race somewhere.
On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED bit > > was cleared unconditionally inside xprt_disconnect_done, similar > > to the handling of TCP_CLOSE_WAIT. After the change the clearing > > of XPRT_CONNECTED depends on successfully queueing a work based > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > This is significant in the case of an unexpected RST from the > > server, as the client will only see xs_tcp_state_change called with > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > with handling TCP_CLOSE_WAIT. > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > --- > > net/sunrpc/xprtsock.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 8a5e823e0b33..b9789036051d 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > > *sk) > > if (sk->sk_err) > > xprt_wake_pending_tasks(xprt, -sk- > > >sk_err); > > /* Trigger the socket release */ > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > xs_tcp_force_close(xprt); > > } > > out: > > Hi Dave, > > This isn't needed for 4.20 or newer because call_transmit() will now > always call xprt_end_transmit(). I suggest that a stable fix do > something similar (perhaps conditional on the error returned by > xprt_transmit()?). > > It appears the "Special cases" comment inside call_transmit_status() should be removed now, do you agree? Also I guess you are not concerned about this issue anymore that created the comment and the cases? https://lore.kernel.org/patchwork/patch/165428/ Also for a stable fix, would you prefer handling all of the codes on lines 39-43 the same and removing that comment, or just EPIPE we are seeing in the testcase? After some discussion with Scott we're testing variants on the above (see below for possible fixes). Any preferences on option 1 or option 2? Thanks. Upstream (4.20-rc6) has a comment but it looks wrong due to code in call_transmit(): 1971 static void 1972 call_transmit(struct rpc_task *task) 1973 { 1974 dprint_status(task); 1975 1976 task->tk_status = 0; 1977 if (test_bit(RPC_TASK_NEED_XMIT, &task->tk_runstate)) { 1978 if (!xprt_prepare_transmit(task)) 1979 return; 1980 xprt_transmit(task); 1981 } 1982 task->tk_action = call_transmit_status; 1983 xprt_end_transmit(task); <--------------------- Always drop the XPRT_LOCK 1984 } ... 1989 static void 1990 call_transmit_status(struct rpc_task *task) 1991 { 1992 task->tk_action = call_status; 1993 1994 /* 1995 * Common case: success. Force the compiler to put this 1996 * test first. 1997 */ 1998 if (task->tk_status == 0) { 1999 xprt_request_wait_receive(task); 2000 return; 2001 } 2002 2003 switch (task->tk_status) { 2004 default: 2005 dprint_status(task); 2006 break; 2007 case -EBADMSG: 2008 task->tk_status = 0; 2009 task->tk_action = call_encode; 2010 break; 2011 /* 2012 * Special cases: if we've been waiting on the <---- WRONG 2013 * socket's write_space() callback, or if the <----- WRONG 2014 * socket just returned a connection error, <------- WRONG 2015 * then hold onto the transport lock. <----------- WRONG 2016 */ 2017 case -ENOBUFS: 2018 rpc_delay(task, HZ>>2); 2019 /* fall through */ 2020 case -EBADSLT: 2021 case -EAGAIN: 2022 task->tk_action = call_transmit; 2023 task->tk_status = 0; 2024 break; 2025 case -ECONNREFUSED: 2026 case -EHOSTDOWN: 2027 case -ENETDOWN: 2028 case -EHOSTUNREACH: 2029 case -ENETUNREACH: 2030 case -EPERM: 2031 if (RPC_IS_SOFTCONN(task)) { 2032 if (!task->tk_msg.rpc_proc->p_proc) 2033 trace_xprt_ping(task->tk_xprt, 2034 task->tk_status); 2035 rpc_exit(task, task->tk_status); 2036 break; 2037 } 2038 /* fall through */ 2039 case -ECONNRESET: 2040 case -ECONNABORTED: 2041 case -EADDRINUSE: 2042 case -ENOTCONN: 2043 case -EPIPE: 2044 break; 2045 } 2046 } For 4.16 - 4.19 possible fixes Possible fix #1 - just remove EPIPE and/or all the codes and move these to 'default': 2002 static void 2003 call_transmit_status(struct rpc_task *task) 2004 { 2005 task->tk_action = call_status; 2006 2007 /* 2008 * Common case: success. Force the compiler to put this 2009 * test first. 2010 */ 2011 if (task->tk_status == 0) { 2012 xprt_end_transmit(task); 2013 rpc_task_force_reencode(task); 2014 return; 2015 } 2016 2017 switch (task->tk_status) { 2018 case -EAGAIN: 2019 case -ENOBUFS: 2020 break; 2021 default: 2022 dprint_status(task); 2023 xprt_end_transmit(task); 2024 rpc_task_force_reencode(task); 2025 break; 2026 /* 2027 * Special cases: if we've been waiting on the 2028 * socket's write_space() callback, or if the 2029 * socket just returned a connection error, <----------- WRONG ??? 2030 * then hold onto the transport lock. <-------------- WRONG ??? 2031 */ 2032 case -ECONNREFUSED: 2033 case -EHOSTDOWN: 2034 case -EHOSTUNREACH: 2035 case -ENETUNREACH: 2036 case -EPERM: 2037 if (RPC_IS_SOFTCONN(task)) { 2038 xprt_end_transmit(task); 2039 rpc_exit(task, task->tk_status); 2040 break; 2041 } 2042 case -ECONNRESET: <------- ???? 2043 case -ECONNABORTED: <------ ???? 2044 case -EADDRINUSE: <------- ???? 2045 case -ENOTCONN: <-------- ???? What about these others ??? 2046 case -EPIPE: <------------------------------ remove this line to fix bug 2047 rpc_task_force_reencode(task); 2048 } 2049 } Possible fix #2: make calling xprt_end_transmit() conditional on both EPIPE and xprt_connected() case -EPIPE: + if (xprt_connected(task->tk_xprt)) + xprt_end_transmit(task);
On Wed, 12 Dec 2018, Trond Myklebust wrote: > On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: > > Commit 9b30889c548a changed the handling of TCP_CLOSE inside > > xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED bit > > was cleared unconditionally inside xprt_disconnect_done, similar > > to the handling of TCP_CLOSE_WAIT. After the change the clearing > > of XPRT_CONNECTED depends on successfully queueing a work based > > xprt_autoclose which depends on XPRT_LOCKED and may not happen. > > This is significant in the case of an unexpected RST from the > > server, as the client will only see xs_tcp_state_change called with > > sk_state == TCP_CLOSE. Restore the unconditional clear_bit on > > XPRT_CONNECTED while handling TCP_CLOSE and make it consistent > > with handling TCP_CLOSE_WAIT. > > > > Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> > > --- > > net/sunrpc/xprtsock.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c > > index 8a5e823e0b33..b9789036051d 100644 > > --- a/net/sunrpc/xprtsock.c > > +++ b/net/sunrpc/xprtsock.c > > @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock > > *sk) > > if (sk->sk_err) > > xprt_wake_pending_tasks(xprt, -sk->sk_err); > > /* Trigger the socket release */ > > + clear_bit(XPRT_CONNECTED, &xprt->state); > > xs_tcp_force_close(xprt); > > } > > out: > > Hi Dave, > > This isn't needed for 4.20 or newer because call_transmit() will now > always call xprt_end_transmit(). Dave's script can also produce a hang on 4.20, but for a different reason... if xs_reset_transport() is called on an xprt that has XPRT_WRITE_SPACE set, then no further RPC tasks will be able to lock the transport or connect it. Clearing XPRT_WRITE_SPACE in xs_sock_reset_connection_flags() seems to fix that. I don't know if XPRT_WRITE_SPACE counts as a "connection flag" though. -Scott > I suggest that a stable fix do > something similar (perhaps conditional on the error returned by > xprt_transmit()?). > > Cheers > Trond > > -- > Trond Myklebust > Linux NFS client maintainer, Hammerspace > trond.myklebust@hammerspace.com > >
On 12 Dec 2018, at 13:02, Trond Myklebust wrote: > On Wed, 2018-12-12 at 12:47 -0500, Dave Wysochanski wrote: >> On Wed, 2018-12-12 at 16:56 +0000, Trond Myklebust wrote: >>> On Wed, 2018-12-12 at 08:51 -0500, Dave Wysochanski wrote: >>>> Commit 9b30889c548a changed the handling of TCP_CLOSE inside >>>> xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED >>>> bit >>>> was cleared unconditionally inside xprt_disconnect_done, similar >>>> to the handling of TCP_CLOSE_WAIT. After the change the clearing >>>> of XPRT_CONNECTED depends on successfully queueing a work based >>>> xprt_autoclose which depends on XPRT_LOCKED and may not happen. >>>> This is significant in the case of an unexpected RST from the >>>> server, as the client will only see xs_tcp_state_change called >>>> with >>>> sk_state == TCP_CLOSE. Restore the unconditional clear_bit on >>>> XPRT_CONNECTED while handling TCP_CLOSE and make it consistent >>>> with handling TCP_CLOSE_WAIT. >>>> >>>> Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> >>>> --- >>>> net/sunrpc/xprtsock.c | 1 + >>>> 1 file changed, 1 insertion(+) >>>> >>>> diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c >>>> index 8a5e823e0b33..b9789036051d 100644 >>>> --- a/net/sunrpc/xprtsock.c >>>> +++ b/net/sunrpc/xprtsock.c >>>> @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock >>>> *sk) >>>> if (sk->sk_err) >>>> xprt_wake_pending_tasks(xprt, -sk- >>>>> sk_err); >>>> /* Trigger the socket release */ >>>> + clear_bit(XPRT_CONNECTED, &xprt->state); >>>> xs_tcp_force_close(xprt); >>>> } >>>> out: >>> >>> Hi Dave, >>> >>> This isn't needed for 4.20 or newer because call_transmit() will >>> now >>> always call xprt_end_transmit(). I suggest that a stable fix do >>> something similar (perhaps conditional on the error returned by >>> xprt_transmit()?). >>> >> >> Can you explain the handling in xs_tcp_state_change - why >> XPRT_CONNECTED would need to remain set longer in the case of >> handling >> TCP_CLOSE case rather than TCP_CLOSE_WAIT? It seems like if we get >> an >> RST we'll go directly to TCP_CLOSE and why would the XPRT_CONNECTED >> bit >> getting cleared need to be delayed in that case? >> >> I will look into the approach you mention though at the moment I do >> not >> see how it helps because the underlying issue seems to be clearing of >> the XPRT_CONNECTED bit. >> > > See xprt_clear_locked(). Dropping the XPRT_LOCK will automatically > trigger autoclose if XPRT_CLOSE_WAIT is set, regardless of whether or > not there are other tasks waiting for the lock. It seems to make sense to drop the XPRT_LOCK in call_transmit_status if XPRT_CLOSE_WAIT is set.. something like: diff --git a/include/linux/sunrpc/xprt.h b/include/linux/sunrpc/xprt.h index 336fd1a19cca..f30bf500888d 100644 --- a/include/linux/sunrpc/xprt.h +++ b/include/linux/sunrpc/xprt.h @@ -443,6 +443,11 @@ static inline int xprt_test_and_set_connecting(struct rpc_xprt *xprt) return test_and_set_bit(XPRT_CONNECTING, &xprt->state); } +static inline int xprt_close_wait(struct rpc_xprt *xprt) +{ + return test_bit(XPRT_CLOSE_WAIT, &xprt->state); +} + static inline void xprt_set_bound(struct rpc_xprt *xprt) { test_and_set_bit(XPRT_BOUND, &xprt->state); diff --git a/net/sunrpc/clnt.c b/net/sunrpc/clnt.c index 8ea2f5fadd96..1fc812ba9871 100644 --- a/net/sunrpc/clnt.c +++ b/net/sunrpc/clnt.c @@ -1992,13 +1992,15 @@ call_transmit(struct rpc_task *task) static void call_transmit_status(struct rpc_task *task) { + struct rpc_xprt *xprt = task->tk_rqstp->rq_xprt; task->tk_action = call_status; /* * Common case: success. Force the compiler to put this - * test first. + * test first. Or, if any error and xprt_close_wait, + * release the xprt lock so the socket can close. */ - if (task->tk_status == 0) { + if (task->tk_status == 0 || xprt_close_wait(xprt)) { xprt_end_transmit(task); rpc_task_force_reencode(task); return; Hmm.. I guess I have to figure out how to get a stable patch submitted that will never have a mainline commit, since we don't have this problem in the mainline. Ben
diff --git a/net/sunrpc/xprtsock.c b/net/sunrpc/xprtsock.c index 8a5e823e0b33..b9789036051d 100644 --- a/net/sunrpc/xprtsock.c +++ b/net/sunrpc/xprtsock.c @@ -1492,6 +1492,7 @@ static void xs_tcp_state_change(struct sock *sk) if (sk->sk_err) xprt_wake_pending_tasks(xprt, -sk->sk_err); /* Trigger the socket release */ + clear_bit(XPRT_CONNECTED, &xprt->state); xs_tcp_force_close(xprt); } out:
Commit 9b30889c548a changed the handling of TCP_CLOSE inside xs_tcp_state_change. Prior to this change, the XPRT_CONNECTED bit was cleared unconditionally inside xprt_disconnect_done, similar to the handling of TCP_CLOSE_WAIT. After the change the clearing of XPRT_CONNECTED depends on successfully queueing a work based xprt_autoclose which depends on XPRT_LOCKED and may not happen. This is significant in the case of an unexpected RST from the server, as the client will only see xs_tcp_state_change called with sk_state == TCP_CLOSE. Restore the unconditional clear_bit on XPRT_CONNECTED while handling TCP_CLOSE and make it consistent with handling TCP_CLOSE_WAIT. Signed-off-by: Dave Wysochanski <dwysocha@redhat.com> --- net/sunrpc/xprtsock.c | 1 + 1 file changed, 1 insertion(+)