diff mbox series

[net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb()

Message ID 20211020232447.9548-1-jmaxwell37@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net-next] tcp: don't free a FIN sk_buff in tcp_remove_empty_skb() | expand

Checks

Context Check Description
netdev/cover_letter success Single patches do not need cover letters
netdev/fixes_present success Fixes tag not required for -next series
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/module_param success Was 0 now: 0
netdev/build_32bit fail Errors and warnings before: 1 this patch: 2
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Fixes tag looks correct
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn fail Errors and warnings before: 1 this patch: 2
netdev/header_inline success No static functions without inline keyword in header files

Commit Message

Jonathan Maxwell Oct. 20, 2021, 11:24 p.m. UTC
A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that 
the write_queue was not empty as determined by tcp_write_queue_empty() but the 
sk_buff containing the FIN flag had been freed and the socket was zombied in 
that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.

Some instrumentation was added to the kernel and it was found that there is a 
timing window where tcp_sendmsg() can run after tcp_send_fin().

tcp_sendmsg() will hit an error, for example:

1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
1270 ▹       ▹       goto do_error;↩

tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.

If the other side sends a FIN packet the socket will transition to CLOSING and
remain that way until the system is rebooted.

Fix this by checking for the FIN flag in the sk_buff and don't free it if that 
is the case. Testing confirmed that fixed the issue.

Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
---
 net/ipv4/tcp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Eric Dumazet Oct. 21, 2021, 1:10 a.m. UTC | #1
On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote:
>
> A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> the write_queue was not empty as determined by tcp_write_queue_empty() but the
> sk_buff containing the FIN flag had been freed and the socket was zombied in
> that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
>
> Some instrumentation was added to the kernel and it was found that there is a
> timing window where tcp_sendmsg() can run after tcp_send_fin().
>
> tcp_sendmsg() will hit an error, for example:
>
> 1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
> 1270 ▹       ▹       goto do_error;↩
>
> tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
> TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.
>
> If the other side sends a FIN packet the socket will transition to CLOSING and
> remain that way until the system is rebooted.
>
> Fix this by checking for the FIN flag in the sk_buff and don't free it if that
> is the case. Testing confirmed that fixed the issue.
>
> Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> ---
>  net/ipv4/tcp.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index c2d9830136d2..d2b06d8f0c37 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
>   */
>  void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
>  {
> -       if (skb && !skb->len) {
> +       if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
>                 tcp_unlink_write_queue(skb, sk);
>                 if (tcp_write_queue_empty(sk))
>                         tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
>

Very nice catch !

The FIN flag is a really special case here.

What we need is to make sure the skb is 'empty' .

What about using a single condition ?

if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)
Jonathan Maxwell Oct. 21, 2021, 1:48 a.m. UTC | #2
On Thu, Oct 21, 2021 at 12:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Oct 20, 2021 at 4:25 PM Jon Maxwell <jmaxwell37@gmail.com> wrote:
> >
> > A customer reported sockets stuck in the CLOSING state. A Vmcore revealed that
> > the write_queue was not empty as determined by tcp_write_queue_empty() but the
> > sk_buff containing the FIN flag had been freed and the socket was zombied in
> > that state. Corresponding pcaps show no FIN from the Linux kernel on the wire.
> >
> > Some instrumentation was added to the kernel and it was found that there is a
> > timing window where tcp_sendmsg() can run after tcp_send_fin().
> >
> > tcp_sendmsg() will hit an error, for example:
> >
> > 1269 ▹       if (sk->sk_err || (sk->sk_shutdown & SEND_SHUTDOWN))↩
> > 1270 ▹       ▹       goto do_error;↩
> >
> > tcp_remove_empty_skb() will then free the FIN sk_buff as "skb->len == 0". The
> > TCP socket is now wedged in the FIN-WAIT-1 state because the FIN is never sent.
> >
> > If the other side sends a FIN packet the socket will transition to CLOSING and
> > remain that way until the system is rebooted.
> >
> > Fix this by checking for the FIN flag in the sk_buff and don't free it if that
> > is the case. Testing confirmed that fixed the issue.
> >
> > Fixes: fdfc5c8594c2 ("tcp: remove empty skb from write queue in error cases")
> > Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com>
> > ---
> >  net/ipv4/tcp.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> > index c2d9830136d2..d2b06d8f0c37 100644
> > --- a/net/ipv4/tcp.c
> > +++ b/net/ipv4/tcp.c
> > @@ -938,7 +938,7 @@ int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
> >   */
> >  void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
> >  {
> > -       if (skb && !skb->len) {
> > +       if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
> >                 tcp_unlink_write_queue(skb, sk);
> >                 if (tcp_write_queue_empty(sk))
> >                         tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
> >
>
> Very nice catch !
>

Thanks Eric.

> The FIN flag is a really special case here.
>
> What we need is to make sure the skb is 'empty' .
>
> What about using a single condition ?
>
> if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq)

Good call as the end_seq will be +1 for a FIN. So that's better.

Let me give the customer a kernel with your idea:

--- net/ipv4/tcp.c 2021-10-20 22:50:35.836001950 +0530
+++ net/ipv4/tcp.c.patch 2021-10-21 01:42:08.493569483 +0530
@@ -955,7 +955,7 @@
  */
 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
- if (skb && !skb->len) {
+ if (skb && TCP_SKB_CB(skb)->seq == TCP_SKB_CB(skb)->end_seq) {
  tcp_unlink_write_queue(skb, sk);
  if (tcp_write_queue_empty(sk))
  tcp_chrono_stop(sk, TCP_CHRONO_BUSY);

I'll ask the customer to confirm that the v1 patch as above also resolves
the issue. Although I expect it will.

Then I'll resubmit a v1 patch with your suggestion probably early next week.

Regards

Jon
kernel test robot Oct. 21, 2021, 5:35 a.m. UTC | #3
Hi Jon,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on net-next/master]

url:    https://github.com/0day-ci/linux/commits/Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
base:   https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git 2641b62d2fab52648e34cdc6994b2eacde2d27c1
config: i386-randconfig-a004-20211021 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 3cea2505fd8d99a9ba0cb625aecfe28a47c4e3f8)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # https://github.com/0day-ci/linux/commit/87f5ea309107de5183ec0bd7d0b48ec90546d350
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Jon-Maxwell/tcp-don-t-free-a-FIN-sk_buff-in-tcp_remove_empty_skb/20211021-072644
        git checkout 87f5ea309107de5183ec0bd7d0b48ec90546d350
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

>> net/ipv4/tcp.c:941:26: warning: logical not is only applied to the left hand side of this bitwise operator [-Wlogical-not-parentheses]
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^                           ~
   net/ipv4/tcp.c:941:26: note: add parentheses after the '!' to evaluate the bitwise operator first
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^
                                    (                                      )
   net/ipv4/tcp.c:941:26: note: add parentheses around left hand side expression to silence this warning
           if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
                                   ^
                                   (                          )
   1 warning generated.


vim +941 net/ipv4/tcp.c

   932	
   933	/* In some cases, both sendpage() and sendmsg() could have added
   934	 * an skb to the write queue, but failed adding payload on it.
   935	 * We need to remove it to consume less memory, but more
   936	 * importantly be able to generate EPOLLOUT for Edge Trigger epoll()
   937	 * users.
   938	 */
   939	void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
   940	{
 > 941		if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
   942			tcp_unlink_write_queue(skb, sk);
   943			if (tcp_write_queue_empty(sk))
   944				tcp_chrono_stop(sk, TCP_CHRONO_BUSY);
   945			sk_wmem_free_skb(sk, skb);
   946		}
   947	}
   948	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index c2d9830136d2..d2b06d8f0c37 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -938,7 +938,7 @@  int tcp_send_mss(struct sock *sk, int *size_goal, int flags)
  */
 void tcp_remove_empty_skb(struct sock *sk, struct sk_buff *skb)
 {
-	if (skb && !skb->len) {
+	if (skb && !skb->len && !TCP_SKB_CB(skb)->tcp_flags & TCPHDR_FIN) {
 		tcp_unlink_write_queue(skb, sk);
 		if (tcp_write_queue_empty(sk))
 			tcp_chrono_stop(sk, TCP_CHRONO_BUSY);