Message ID | 20211020232447.9548-1-jmaxwell37@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Single patches do not need cover letters |
netdev/fixes_present | success | Fixes tag not required for -next series |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net-next |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | success | CCed 6 of 6 maintainers |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Signed-off-by tag matches author and committer |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | fail | Errors and warnings before: 1 this patch: 2 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Fixes tag looks correct |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | fail | Errors and warnings before: 1 this patch: 2 |
netdev/header_inline | success | No static functions without inline keyword in header files |
On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote: > > A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that > the write_queue was not empty as determined by tcp_write_queue_empty() but the > sk_buff containing the FIN flag had been freed and the socket was zombied in > that state. Corresponding pcaps show no FIN from the Linux kernel on the wire. > > Some instrumentation was added to the kernel and it was found that there is a > timing window where tcp_sendmsg() can run after tcp_send_fin(). > > tcp_sendmsg() will hit an error, for example: > > 1269 ▹ if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩ > 1270 ▹ ▹ goto do_error;↩ > > tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The > TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent. > > If the other side sends a FIN packet the socket will transition to CLOSING and > remain that way until the system is rebooted. > > Fix this by checking for the FIN flag in the sk_buff and don't free it if that > is the case. Testing confirmed that fixed the issue. > > Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") > Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> > --- > net/ipv4/tcp.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index c2d9830136d2..d2b06d8f0c37 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags) > */ > void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) > { > - if (skb && !skb->len) { > + if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { > tcp_unlink_write_queue(skb, sk); > if (tcp_write_queue_empty(sk)) > tcp_chrono_stop(sk, TCP_CHRONO_BUSY); > Very nice catch ! The FIN flag is a really special case here. What we need is to make sure the skb is 'empty' . What about using a single condition ? if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)
On Thu, Oct 21, 2021 at 12:11 PM Eric Dumazet <edumazet@google.com> wrote: > > On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote: > > > > A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that > > the write_queue was not empty as determined by tcp_write_queue_empty() but the > > sk_buff containing the FIN flag had been freed and the socket was zombied in > > that state. Corresponding pcaps show no FIN from the Linux kernel on the wire. > > > > Some instrumentation was added to the kernel and it was found that there is a > > timing window where tcp_sendmsg() can run after tcp_send_fin(). > > > > tcp_sendmsg() will hit an error, for example: > > > > 1269 ▹ if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩ > > 1270 ▹ ▹ goto do_error;↩ > > > > tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The > > TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent. > > > > If the other side sends a FIN packet the socket will transition to CLOSING and > > remain that way until the system is rebooted. > > > > Fix this by checking for the FIN flag in the sk_buff and don't free it if that > > is the case. Testing confirmed that fixed the issue. > > > > Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") > > Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> > > --- > > net/ipv4/tcp.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index c2d9830136d2..d2b06d8f0c37 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags) > > */ > > void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) > > { > > - if (skb && !skb->len) { > > + if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { > > tcp_unlink_write_queue(skb, sk); > > if (tcp_write_queue_empty(sk)) > > tcp_chrono_stop(sk, TCP_CHRONO_BUSY); > > > > Very nice catch ! > Thanks Eric. > The FIN flag is a really special case here. > > What we need is to make sure the skb is 'empty' . > > What about using a single condition ? > > if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) Good call as the end_seq will be +1 for a FIN. So that's better. Let me give the customer a kernel with your idea: --- net/ipv4/tcp.c 2021-10-20 22:50:35.836001950 +0530 +++ net/ipv4/tcp.c.patch 2021-10-21 01:42:08.493569483 +0530 @@ -955,7 +955,7 @@ */ void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { - if (skb && !skb->len) { + if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) { tcp_unlink_write_queue(skb, sk); if (tcp_write_queue_empty(sk)) tcp_chrono_stop(sk, TCP_CHRONO_BUSY); I'll ask the customer to confirm that the v1 patch as above also resolves the issue. Although I expect it will. Then I'll resubmit a v1 patch with your suggestion probably early next week. Regards Jon
Hi Jon, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on net-next/master] url: https://github.com/0day-ci/linux/commits/Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 2641b62d2fab52648e34cdc6994b2eacde2d27c1 config: i386-randconfig-a004-20211021 (attached as .config) compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 3cea2505fd8d99a9ba0cb625aecfe28a47c4e3f8) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/0day-ci/linux/commit/87f5ea309107de5183ec0bd7d0b48ec90546d350 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644 git checkout 87f5ea309107de5183ec0bd7d0b48ec90546d350 # save the attached .config to linux build tree COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All warnings (new ones prefixed by >>): >> net/ipv4/tcp.c:941:26: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses] if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { ^ ~ net/ipv4/tcp.c:941:26: note: add parentheses after the '!' to evaluate the bitwise operator first if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { ^ ( ) net/ipv4/tcp.c:941:26: note: add parentheses around left hand side expression to silence this warning if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { ^ ( ) 1 warning generated. vim +941 net/ipv4/tcp.c 932 933 /* In some cases, both sendpage() and sendmsg() could have added 934 * an skb to the write queue, but failed adding payload on it. 935 * We need to remove it to consume less memory, but more 936 * importantly be able to generate EPOLLOUT for Edge Trigger epoll() 937 * users. 938 */ 939 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) 940 { > 941 if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { 942 tcp_unlink_write_queue(skb, sk); 943 if (tcp_write_queue_empty(sk)) 944 tcp_chrono_stop(sk, TCP_CHRONO_BUSY); 945 sk_wmem_free_skb(sk, skb); 946 } 947 } 948 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index c2d9830136d2..d2b06d8f0c37 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags) */ void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb) { - if (skb && !skb->len) { + if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) { tcp_unlink_write_queue(skb, sk); if (tcp_write_queue_empty(sk)) tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that the write_queue was not empty as determined by tcp_write_queue_empty() but the sk_buff containing the FIN flag had been freed and the socket was zombied in that state. Corresponding pcaps show no FIN from the Linux kernel on the wire. Some instrumentation was added to the kernel and it was found that there is a timing window where tcp_sendmsg() can run after tcp_send_fin(). tcp_sendmsg() will hit an error, for example: 1269 ▹ if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩ 1270 ▹ ▹ goto do_error;↩ tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent. If the other side sends a FIN packet the socket will transition to CLOSING and remain that way until the system is rebooted. Fix this by checking for the FIN flag in the sk_buff and don't free it if that is the case. Testing confirmed that fixed the issue. Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases") Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> --- net/ipv4/tcp.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)