diff mbox series

[net] tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets

Message ID 20240501125448.896529-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 94062790aedb505bdda209b10bea47b294d6394f
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 929 this patch: 929
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 940 this patch: 940
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 940 this patch: 940
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 3 this patch: 3
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-02--15-00 (tests: 1000)

Commit Message

Eric Dumazet May 1, 2024, 12:54 p.m. UTC
TCP_SYN_RECV state is really special, it is only used by
cross-syn connections, mostly used by fuzzers.

In the following crash [1], syzbot managed to trigger a divide
by zero in tcp_rcv_space_adjust()

A socket makes the following state transitions,
without ever calling tcp_init_transfer(),
meaning tcp_init_buffer_space() is also not called.

         TCP_CLOSE
connect()
         TCP_SYN_SENT
         TCP_SYN_RECV
shutdown() -> tcp_shutdown(sk, SEND_SHUTDOWN)
         TCP_FIN_WAIT1

To fix this issue, change tcp_shutdown() to not
perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition,
which makes no sense anyway.

When tcp_rcv_state_process() later changes socket state
from TCP_SYN_RECV to TCP_ESTABLISH, then look at
sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state,
and send a FIN packet from a sane socket state.

This means tcp_send_fin() can now be called from BH
context, and must use GFP_ATOMIC allocations.

[1]
divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
CPU: 1 PID: 5084 Comm: syz-executor358 Not tainted 6.9.0-rc6-syzkaller-00022-g98369dccd2f8 #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
 RIP: 0010:tcp_rcv_space_adjust+0x2df/0x890 net/ipv4/tcp_input.c:767
Code: e3 04 4c 01 eb 48 8b 44 24 38 0f b6 04 10 84 c0 49 89 d5 0f 85 a5 03 00 00 41 8b 8e c8 09 00 00 89 e8 29 c8 48 0f af c3 31 d2 <48> f7 f1 48 8d 1c 43 49 8d 96 76 08 00 00 48 89 d0 48 c1 e8 03 48
RSP: 0018:ffffc900031ef3f0 EFLAGS: 00010246
RAX: 0c677a10441f8f42 RBX: 000000004fb95e7e RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: 0000000027d4b11f R08: ffffffff89e535a4 R09: 1ffffffff25e6ab7
R10: dffffc0000000000 R11: ffffffff8135e920 R12: ffff88802a9f8d30
R13: dffffc0000000000 R14: ffff88802a9f8d00 R15: 1ffff1100553f2da
FS:  00005555775c0380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f1155bf2304 CR3: 000000002b9f2000 CR4: 0000000000350ef0
Call Trace:
 <TASK>
  tcp_recvmsg_locked+0x106d/0x25a0 net/ipv4/tcp.c:2513
  tcp_recvmsg+0x25d/0x920 net/ipv4/tcp.c:2578
  inet6_recvmsg+0x16a/0x730 net/ipv6/af_inet6.c:680
  sock_recvmsg_nosec net/socket.c:1046 [inline]
  sock_recvmsg+0x109/0x280 net/socket.c:1068
  ____sys_recvmsg+0x1db/0x470 net/socket.c:2803
  ___sys_recvmsg net/socket.c:2845 [inline]
  do_recvmmsg+0x474/0xae0 net/socket.c:2939
  __sys_recvmmsg net/socket.c:3018 [inline]
  __do_sys_recvmmsg net/socket.c:3041 [inline]
  __se_sys_recvmmsg net/socket.c:3034 [inline]
  __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
  do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7faeb6363db9
Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ffcc1997168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faeb6363db9
RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001c
R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001

Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
Reported-by: syzbot <syzkaller@googlegroups.com>
Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/tcp.c        | 4 ++--
 net/ipv4/tcp_input.c  | 2 ++
 net/ipv4/tcp_output.c | 4 +++-
 3 files changed, 7 insertions(+), 3 deletions(-)

Comments

Neal Cardwell May 1, 2024, 8:27 p.m. UTC | #1
On Wed, May 1, 2024 at 8:54 AM Eric Dumazet <edumazet@google.com> wrote:
>
> TCP_SYN_RECV state is really special, it is only used by
> cross-syn connections, mostly used by fuzzers.
>
> In the following crash [1], syzbot managed to trigger a divide
> by zero in tcp_rcv_space_adjust()
>
> A socket makes the following state transitions,
> without ever calling tcp_init_transfer(),
> meaning tcp_init_buffer_space() is also not called.
>
>          TCP_CLOSE
> connect()
>          TCP_SYN_SENT
>          TCP_SYN_RECV
> shutdown() -> tcp_shutdown(sk, SEND_SHUTDOWN)
>          TCP_FIN_WAIT1
>
> To fix this issue, change tcp_shutdown() to not
> perform a TCP_SYN_RECV -> TCP_FIN_WAIT1 transition,
> which makes no sense anyway.
>
> When tcp_rcv_state_process() later changes socket state
> from TCP_SYN_RECV to TCP_ESTABLISH, then look at
> sk->sk_shutdown to finally enter TCP_FIN_WAIT1 state,
> and send a FIN packet from a sane socket state.
>
> This means tcp_send_fin() can now be called from BH
> context, and must use GFP_ATOMIC allocations.
>
> [1]
> divide error: 0000 [#1] PREEMPT SMP KASAN NOPTI
> CPU: 1 PID: 5084 Comm: syz-executor358 Not tainted 6.9.0-rc6-syzkaller-00022-g98369dccd2f8 #0
> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024
>  RIP: 0010:tcp_rcv_space_adjust+0x2df/0x890 net/ipv4/tcp_input.c:767
> Code: e3 04 4c 01 eb 48 8b 44 24 38 0f b6 04 10 84 c0 49 89 d5 0f 85 a5 03 00 00 41 8b 8e c8 09 00 00 89 e8 29 c8 48 0f af c3 31 d2 <48> f7 f1 48 8d 1c 43 49 8d 96 76 08 00 00 48 89 d0 48 c1 e8 03 48
> RSP: 0018:ffffc900031ef3f0 EFLAGS: 00010246
> RAX: 0c677a10441f8f42 RBX: 000000004fb95e7e RCX: 0000000000000000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: 0000000027d4b11f R08: ffffffff89e535a4 R09: 1ffffffff25e6ab7
> R10: dffffc0000000000 R11: ffffffff8135e920 R12: ffff88802a9f8d30
> R13: dffffc0000000000 R14: ffff88802a9f8d00 R15: 1ffff1100553f2da
> FS:  00005555775c0380(0000) GS:ffff8880b9500000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007f1155bf2304 CR3: 000000002b9f2000 CR4: 0000000000350ef0
> Call Trace:
>  <TASK>
>   tcp_recvmsg_locked+0x106d/0x25a0 net/ipv4/tcp.c:2513
>   tcp_recvmsg+0x25d/0x920 net/ipv4/tcp.c:2578
>   inet6_recvmsg+0x16a/0x730 net/ipv6/af_inet6.c:680
>   sock_recvmsg_nosec net/socket.c:1046 [inline]
>   sock_recvmsg+0x109/0x280 net/socket.c:1068
>   ____sys_recvmsg+0x1db/0x470 net/socket.c:2803
>   ___sys_recvmsg net/socket.c:2845 [inline]
>   do_recvmmsg+0x474/0xae0 net/socket.c:2939
>   __sys_recvmmsg net/socket.c:3018 [inline]
>   __do_sys_recvmmsg net/socket.c:3041 [inline]
>   __se_sys_recvmmsg net/socket.c:3034 [inline]
>   __x64_sys_recvmmsg+0x199/0x250 net/socket.c:3034
>   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>   do_syscall_64+0xf5/0x240 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x77/0x7f
> RIP: 0033:0x7faeb6363db9
> Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 c1 17 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48
> RSP: 002b:00007ffcc1997168 EFLAGS: 00000246 ORIG_RAX: 000000000000012b
> RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007faeb6363db9
> RDX: 0000000000000001 RSI: 0000000020000bc0 RDI: 0000000000000005
> RBP: 0000000000000000 R08: 0000000000000000 R09: 000000000000001c
> R10: 0000000000000122 R11: 0000000000000246 R12: 0000000000000000
> R13: 0000000000000000 R14: 0000000000000001 R15: 0000000000000001
>
> Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2")
> Reported-by: syzbot <syzkaller@googlegroups.com>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> ---

Very nice find and fix! Thanks, Eric!

Acked-by: Neal Cardwell <ncardwell@google.com>

neal
patchwork-bot+netdevbpf@kernel.org May 3, 2024, 2:10 a.m. UTC | #2
Hello:

This patch was applied to netdev/net.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  1 May 2024 12:54:48 +0000 you wrote:
> TCP_SYN_RECV state is really special, it is only used by
> cross-syn connections, mostly used by fuzzers.
> 
> In the following crash [1], syzbot managed to trigger a divide
> by zero in tcp_rcv_space_adjust()
> 
> A socket makes the following state transitions,
> without ever calling tcp_init_transfer(),
> meaning tcp_init_buffer_space() is also not called.
> 
> [...]

Here is the summary with links:
  - [net] tcp: defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets
    https://git.kernel.org/netdev/net/c/94062790aedb

You are awesome, thank you!
diff mbox series

Patch

diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index e767721b3a588b5d56567ae7badf5dffcd35a76a..66d77faca64f6db95e04f4c0e7dd3892628ae3f7 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2710,7 +2710,7 @@  void tcp_shutdown(struct sock *sk, int how)
 	/* If we've already sent a FIN, or it's a closed state, skip this. */
 	if ((1 << sk->sk_state) &
 	    (TCPF_ESTABLISHED | TCPF_SYN_SENT |
-	     TCPF_SYN_RECV | TCPF_CLOSE_WAIT)) {
+	     TCPF_CLOSE_WAIT)) {
 		/* Clear out any half completed packets.  FIN if needed. */
 		if (tcp_close_state(sk))
 			tcp_send_fin(sk);
@@ -2819,7 +2819,7 @@  void __tcp_close(struct sock *sk, long timeout)
 		 * machine. State transitions:
 		 *
 		 * TCP_ESTABLISHED -> TCP_FIN_WAIT1
-		 * TCP_SYN_RECV	-> TCP_FIN_WAIT1 (forget it, it's impossible)
+		 * TCP_SYN_RECV	-> TCP_FIN_WAIT1 (it is difficult)
 		 * TCP_CLOSE_WAIT -> TCP_LAST_ACK
 		 *
 		 * are legal only when FIN has been sent (i.e. in window),
diff --git a/net/ipv4/tcp_input.c b/net/ipv4/tcp_input.c
index 5d874817a78db31a4a807ab80e9158300329423d..a140d9f7a0a36e6a0b90c97a44a1e54e7639c71f 100644
--- a/net/ipv4/tcp_input.c
+++ b/net/ipv4/tcp_input.c
@@ -6761,6 +6761,8 @@  tcp_rcv_state_process(struct sock *sk, struct sk_buff *skb)
 
 		tcp_initialize_rcv_mss(sk);
 		tcp_fast_path_on(tp);
+		if (sk->sk_shutdown & SEND_SHUTDOWN)
+			tcp_shutdown(sk, SEND_SHUTDOWN);
 		break;
 
 	case TCP_FIN_WAIT1: {
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index e3167ad965676facaacd8f82848c52cf966f97c3..02caeb7bcf6342713019d31891998fdbe426b573 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -3563,7 +3563,9 @@  void tcp_send_fin(struct sock *sk)
 			return;
 		}
 	} else {
-		skb = alloc_skb_fclone(MAX_TCP_HEADER, sk->sk_allocation);
+		skb = alloc_skb_fclone(MAX_TCP_HEADER,
+				       sk_gfp_mask(sk, GFP_ATOMIC |
+						       __GFP_NOWARN));
 		if (unlikely(!skb))
 			return;