diff mbox series

[net] tcp: ensure sk_showdown is 0 for listening sockets

Message ID 8db98a8fbf2ac673b355651852093579a913f3f1.1716199422.git.pabeni@redhat.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] tcp: ensure sk_showdown is 0 for listening sockets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 910 this patch: 910
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 909 this patch: 909
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes fail Problems with Fixes tag: 3
netdev/build_allmodconfig_warn success Errors and warnings before: 914 this patch: 914
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-05-21--21-00 (tests: 1039)

Commit Message

Paolo Abeni May 20, 2024, 10:04 a.m. UTC
Christoph reported the following splat:

WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
Modules linked in:
CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
PKRU: 55555554
Call Trace:
 <TASK>
 inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
 do_accept+0x435/0x620 net/socket.c:1929
 __sys_accept4_file net/socket.c:1969 [inline]
 __sys_accept4+0x9b/0x110 net/socket.c:1999
 __do_sys_accept net/socket.c:2016 [inline]
 __se_sys_accept net/socket.c:2013 [inline]
 __x64_sys_accept+0x7d/0x90 net/socket.c:2013
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x76/0x7e
RIP: 0033:0x4315f9
Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
 </TASK>

Listener sockets are supposed to have a zero sk_shutdown, as the
accepted children will inherit such field.

Invoking shutdown() before entering the listener status allows
violating the above constraint.

After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
TCP_SYN_RECV sockets"), the above causes the child to reach the accept
syscall in FIN_WAIT1 status.

Address the issue explicitly by clearing sk_shutdown at listen time.

Reported-by: Christoph Paasch <cpaasch@apple.com>
Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
Note: the issue above reports an MPTCP reproducer, but I can reproduce
the issue even using plain TCP sockets only.
---
 net/ipv4/inet_connection_sock.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Simon Horman May 20, 2024, 1 p.m. UTC | #1
On Mon, May 20, 2024 at 12:04:47PM +0200, Paolo Abeni wrote:
> Christoph reported the following splat:
> 
> WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> Modules linked in:
> CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
>  do_accept+0x435/0x620 net/socket.c:1929
>  __sys_accept4_file net/socket.c:1969 [inline]
>  __sys_accept4+0x9b/0x110 net/socket.c:1999
>  __do_sys_accept net/socket.c:2016 [inline]
>  __se_sys_accept net/socket.c:2013 [inline]
>  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x4315f9
> Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
>  </TASK>
> 
> Listener sockets are supposed to have a zero sk_shutdown, as the
> accepted children will inherit such field.
> 
> Invoking shutdown() before entering the listener status allows
> violating the above constraint.
> 
> After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> syscall in FIN_WAIT1 status.
> 
> Address the issue explicitly by clearing sk_shutdown at listen time.
> 
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")

nit: 1da177e4c3f

> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

...
Eric Dumazet May 20, 2024, 1:46 p.m. UTC | #2
On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Christoph reported the following splat:
>
> WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> Modules linked in:
> CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> PKRU: 55555554
> Call Trace:
>  <TASK>
>  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
>  do_accept+0x435/0x620 net/socket.c:1929
>  __sys_accept4_file net/socket.c:1969 [inline]
>  __sys_accept4+0x9b/0x110 net/socket.c:1999
>  __do_sys_accept net/socket.c:2016 [inline]
>  __se_sys_accept net/socket.c:2013 [inline]
>  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
>  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
>  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> RIP: 0033:0x4315f9
> Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
>  </TASK>
>
> Listener sockets are supposed to have a zero sk_shutdown, as the
> accepted children will inherit such field.
>
> Invoking shutdown() before entering the listener status allows
> violating the above constraint.
>
> After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> syscall in FIN_WAIT1 status.
>
> Address the issue explicitly by clearing sk_shutdown at listen time.
>
> Reported-by: Christoph Paasch <cpaasch@apple.com>
> Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
> Note: the issue above reports an MPTCP reproducer, but I can reproduce
> the issue even using plain TCP sockets only.
> ---
>  net/ipv4/inet_connection_sock.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> index 3b38610958ee..dab723fea0cc 100644
> --- a/net/ipv4/inet_connection_sock.c
> +++ b/net/ipv4/inet_connection_sock.c
> @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
>
>         reqsk_queue_alloc(&icsk->icsk_accept_queue);
>
> +       /* closed sockets can have non zero sk_shutdown */
> +       WRITE_ONCE(sk->sk_shutdown, 0);

Hi Paolo.

I am unsure about your patch, I had an internal syzbot report about
this before going OOO for a few days,
and my first reaction was to change the WARN in inet_accept().

Perhaps some applications are relying on calling shutdown() before listen()...
Eric Dumazet May 20, 2024, 2:07 p.m. UTC | #3
On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > Christoph reported the following splat:
> >
> > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > Modules linked in:
> > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > PKRU: 55555554
> > Call Trace:
> >  <TASK>
> >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> >  do_accept+0x435/0x620 net/socket.c:1929
> >  __sys_accept4_file net/socket.c:1969 [inline]
> >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> >  __do_sys_accept net/socket.c:2016 [inline]
> >  __se_sys_accept net/socket.c:2013 [inline]
> >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > RIP: 0033:0x4315f9
> > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> >  </TASK>
> >
> > Listener sockets are supposed to have a zero sk_shutdown, as the
> > accepted children will inherit such field.
> >
> > Invoking shutdown() before entering the listener status allows
> > violating the above constraint.
> >
> > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > syscall in FIN_WAIT1 status.
> >
> > Address the issue explicitly by clearing sk_shutdown at listen time.
> >
> > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > the issue even using plain TCP sockets only.
> > ---
> >  net/ipv4/inet_connection_sock.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > index 3b38610958ee..dab723fea0cc 100644
> > --- a/net/ipv4/inet_connection_sock.c
> > +++ b/net/ipv4/inet_connection_sock.c
> > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> >
> >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> >
> > +       /* closed sockets can have non zero sk_shutdown */
> > +       WRITE_ONCE(sk->sk_shutdown, 0);
>
> Hi Paolo.
>
> I am unsure about your patch, I had an internal syzbot report about
> this before going OOO for a few days,
> and my first reaction was to change the WARN in inet_accept().
>
> Perhaps some applications are relying on calling shutdown() before listen()...

BTW the syzbot repro was

r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
shutdown(r0, 0x1)
bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
listen(r0, 0x0)
r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
accept(r0, 0x0, 0x0)
Paolo Abeni May 20, 2024, 2:46 p.m. UTC | #4
Hi,

On Mon, 2024-05-20 at 16:07 +0200, Eric Dumazet wrote:
> On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
> > 
> > On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > 
> > > Christoph reported the following splat:
> > > 
> > > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > > Modules linked in:
> > > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > PKRU: 55555554
> > > Call Trace:
> > >  <TASK>
> > >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> > >  do_accept+0x435/0x620 net/socket.c:1929
> > >  __sys_accept4_file net/socket.c:1969 [inline]
> > >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> > >  __do_sys_accept net/socket.c:2016 [inline]
> > >  __se_sys_accept net/socket.c:2013 [inline]
> > >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > RIP: 0033:0x4315f9
> > > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> > >  </TASK>
> > > 
> > > Listener sockets are supposed to have a zero sk_shutdown, as the
> > > accepted children will inherit such field.
> > > 
> > > Invoking shutdown() before entering the listener status allows
> > > violating the above constraint.
> > > 
> > > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > > syscall in FIN_WAIT1 status.
> > > 
> > > Address the issue explicitly by clearing sk_shutdown at listen time.
> > > 
> > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > > the issue even using plain TCP sockets only.
> > > ---
> > >  net/ipv4/inet_connection_sock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > index 3b38610958ee..dab723fea0cc 100644
> > > --- a/net/ipv4/inet_connection_sock.c
> > > +++ b/net/ipv4/inet_connection_sock.c
> > > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> > > 
> > >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> > > 
> > > +       /* closed sockets can have non zero sk_shutdown */
> > > +       WRITE_ONCE(sk->sk_shutdown, 0);
> > 
> > Hi Paolo.
> > 
> > I am unsure about your patch, I had an internal syzbot report about
> > this before going OOO for a few days,
> > and my first reaction was to change the WARN in inet_accept().
> > 
> > Perhaps some applications are relying on calling shutdown() before listen()...

Uhmm, right I did not consider that a non zero sk_shutdown would have
affected recvmsg() and sendmsg() even prior to 94062790aedb ("tcp:
defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets").

> BTW the syzbot repro was
> 
> r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
> shutdown(r0, 0x1)
> bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
> listen(r0, 0x0)
> r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
> connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
> accept(r0, 0x0, 0x0)

The above is very similar to what Christoph reported. It should splat
even replacing 0x106 with 0 (mptcp -> tcp).

I'm fine with relaxing the check in __inet_accept(). Do you prefer send
to patch yourself, or me to send a v2? The condition should be

	WARN_ON(!((1 << newsk->sk_state) &
                  (TCPF_ESTABLISHED | TCPF_SYN_RECV |
                   TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 |
                   TCPF_CLOSING | TCPF_CLOSE_WAIT |
                   TCPF_CLOSE)));

I guess.

Thanks!

Paolo
Eric Dumazet May 20, 2024, 2:49 p.m. UTC | #5
On Mon, May 20, 2024 at 4:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2024-05-20 at 16:07 +0200, Eric Dumazet wrote:
> > On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > >
> > > > Christoph reported the following splat:
> > > >
> > > > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > > > Modules linked in:
> > > > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > > > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > > > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > > > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > > > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > > > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > > > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > > > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > PKRU: 55555554
> > > > Call Trace:
> > > >  <TASK>
> > > >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> > > >  do_accept+0x435/0x620 net/socket.c:1929
> > > >  __sys_accept4_file net/socket.c:1969 [inline]
> > > >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> > > >  __do_sys_accept net/socket.c:2016 [inline]
> > > >  __se_sys_accept net/socket.c:2013 [inline]
> > > >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> > > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> > > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > RIP: 0033:0x4315f9
> > > > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > > > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > > > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > > > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > > > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> > > >  </TASK>
> > > >
> > > > Listener sockets are supposed to have a zero sk_shutdown, as the
> > > > accepted children will inherit such field.
> > > >
> > > > Invoking shutdown() before entering the listener status allows
> > > > violating the above constraint.
> > > >
> > > > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > > > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > > > syscall in FIN_WAIT1 status.
> > > >
> > > > Address the issue explicitly by clearing sk_shutdown at listen time.
> > > >
> > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > > > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > > > the issue even using plain TCP sockets only.
> > > > ---
> > > >  net/ipv4/inet_connection_sock.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > > index 3b38610958ee..dab723fea0cc 100644
> > > > --- a/net/ipv4/inet_connection_sock.c
> > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> > > >
> > > >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> > > >
> > > > +       /* closed sockets can have non zero sk_shutdown */
> > > > +       WRITE_ONCE(sk->sk_shutdown, 0);
> > >
> > > Hi Paolo.
> > >
> > > I am unsure about your patch, I had an internal syzbot report about
> > > this before going OOO for a few days,
> > > and my first reaction was to change the WARN in inet_accept().
> > >
> > > Perhaps some applications are relying on calling shutdown() before listen()...
>
> Uhmm, right I did not consider that a non zero sk_shutdown would have
> affected recvmsg() and sendmsg() even prior to 94062790aedb ("tcp:
> defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets").
>
> > BTW the syzbot repro was
> >
> > r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> > sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
> > shutdown(r0, 0x1)
> > bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
> > listen(r0, 0x0)
> > r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
> > connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
> > accept(r0, 0x0, 0x0)
>
> The above is very similar to what Christoph reported. It should splat
> even replacing 0x106 with 0 (mptcp -> tcp).
>
> I'm fine with relaxing the check in __inet_accept(). Do you prefer send
> to patch yourself, or me to send a v2? The condition should be
>
>         WARN_ON(!((1 << newsk->sk_state) &
>                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>                    TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 |
>                    TCPF_CLOSING | TCPF_CLOSE_WAIT |
>                    TCPF_CLOSE)));
>
> I guess.
>
> Thanks!
>
> Paolo
>
>
>
Eric Dumazet May 20, 2024, 2:53 p.m. UTC | #6
On Mon, May 20, 2024 at 4:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Hi,
>
> On Mon, 2024-05-20 at 16:07 +0200, Eric Dumazet wrote:
> > On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > >
> > > > Christoph reported the following splat:
> > > >
> > > > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > > > Modules linked in:
> > > > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > > > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > > > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > > > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > > > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > > > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > > > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > > > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > PKRU: 55555554
> > > > Call Trace:
> > > >  <TASK>
> > > >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> > > >  do_accept+0x435/0x620 net/socket.c:1929
> > > >  __sys_accept4_file net/socket.c:1969 [inline]
> > > >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> > > >  __do_sys_accept net/socket.c:2016 [inline]
> > > >  __se_sys_accept net/socket.c:2013 [inline]
> > > >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> > > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> > > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > RIP: 0033:0x4315f9
> > > > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > > > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > > > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > > > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > > > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > > > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> > > >  </TASK>
> > > >
> > > > Listener sockets are supposed to have a zero sk_shutdown, as the
> > > > accepted children will inherit such field.
> > > >
> > > > Invoking shutdown() before entering the listener status allows
> > > > violating the above constraint.
> > > >
> > > > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > > > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > > > syscall in FIN_WAIT1 status.
> > > >
> > > > Address the issue explicitly by clearing sk_shutdown at listen time.
> > > >
> > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > > > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > > > the issue even using plain TCP sockets only.
> > > > ---
> > > >  net/ipv4/inet_connection_sock.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > > index 3b38610958ee..dab723fea0cc 100644
> > > > --- a/net/ipv4/inet_connection_sock.c
> > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> > > >
> > > >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> > > >
> > > > +       /* closed sockets can have non zero sk_shutdown */
> > > > +       WRITE_ONCE(sk->sk_shutdown, 0);
> > >
> > > Hi Paolo.
> > >
> > > I am unsure about your patch, I had an internal syzbot report about
> > > this before going OOO for a few days,
> > > and my first reaction was to change the WARN in inet_accept().
> > >
> > > Perhaps some applications are relying on calling shutdown() before listen()...
>
> Uhmm, right I did not consider that a non zero sk_shutdown would have
> affected recvmsg() and sendmsg() even prior to 94062790aedb ("tcp:
> defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets").
>
> > BTW the syzbot repro was
> >
> > r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> > sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
> > shutdown(r0, 0x1)
> > bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
> > listen(r0, 0x0)
> > r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
> > connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
> > accept(r0, 0x0, 0x0)
>
> The above is very similar to what Christoph reported. It should splat
> even replacing 0x106 with 0 (mptcp -> tcp).
>
> I'm fine with relaxing the check in __inet_accept(). Do you prefer send
> to patch yourself, or me to send a v2? The condition should be
>
>         WARN_ON(!((1 << newsk->sk_state) &
>                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
>                    TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 |
>                    TCPF_CLOSING | TCPF_CLOSE_WAIT |
>                    TCPF_CLOSE)));
>

Please send a v2.

I am not sure why we need a WARN_ON() to begin with, the socket is
still private.

Even the lock_sock(sk2)/release_sock(sk2) pair in inet_accept() seems overkill.
Paolo Abeni May 20, 2024, 3:13 p.m. UTC | #7
On Mon, 2024-05-20 at 16:53 +0200, Eric Dumazet wrote:
> On Mon, May 20, 2024 at 4:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > 
> > Hi,
> > 
> > On Mon, 2024-05-20 at 16:07 +0200, Eric Dumazet wrote:
> > > On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > 
> > > > On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > 
> > > > > Christoph reported the following splat:
> > > > > 
> > > > > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > > > > Modules linked in:
> > > > > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > > > > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > > > > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > > > > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > > > > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > > > > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > > > > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > > > > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > PKRU: 55555554
> > > > > Call Trace:
> > > > >  <TASK>
> > > > >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> > > > >  do_accept+0x435/0x620 net/socket.c:1929
> > > > >  __sys_accept4_file net/socket.c:1969 [inline]
> > > > >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> > > > >  __do_sys_accept net/socket.c:2016 [inline]
> > > > >  __se_sys_accept net/socket.c:2013 [inline]
> > > > >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> > > > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> > > > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > > RIP: 0033:0x4315f9
> > > > > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > > > > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > > > > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > > > > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > > > > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > > > > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> > > > >  </TASK>
> > > > > 
> > > > > Listener sockets are supposed to have a zero sk_shutdown, as the
> > > > > accepted children will inherit such field.
> > > > > 
> > > > > Invoking shutdown() before entering the listener status allows
> > > > > violating the above constraint.
> > > > > 
> > > > > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > > > > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > > > > syscall in FIN_WAIT1 status.
> > > > > 
> > > > > Address the issue explicitly by clearing sk_shutdown at listen time.
> > > > > 
> > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > > > > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > ---
> > > > > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > > > > the issue even using plain TCP sockets only.
> > > > > ---
> > > > >  net/ipv4/inet_connection_sock.c | 2 ++
> > > > >  1 file changed, 2 insertions(+)
> > > > > 
> > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > > > index 3b38610958ee..dab723fea0cc 100644
> > > > > --- a/net/ipv4/inet_connection_sock.c
> > > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> > > > > 
> > > > >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> > > > > 
> > > > > +       /* closed sockets can have non zero sk_shutdown */
> > > > > +       WRITE_ONCE(sk->sk_shutdown, 0);
> > > > 
> > > > Hi Paolo.
> > > > 
> > > > I am unsure about your patch, I had an internal syzbot report about
> > > > this before going OOO for a few days,
> > > > and my first reaction was to change the WARN in inet_accept().
> > > > 
> > > > Perhaps some applications are relying on calling shutdown() before listen()...
> > 
> > Uhmm, right I did not consider that a non zero sk_shutdown would have
> > affected recvmsg() and sendmsg() even prior to 94062790aedb ("tcp:
> > defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets").
> > 
> > > BTW the syzbot repro was
> > > 
> > > r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> > > sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
> > > shutdown(r0, 0x1)
> > > bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
> > > listen(r0, 0x0)
> > > r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
> > > connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
> > > accept(r0, 0x0, 0x0)
> > 
> > The above is very similar to what Christoph reported. It should splat
> > even replacing 0x106 with 0 (mptcp -> tcp).
> > 
> > I'm fine with relaxing the check in __inet_accept(). Do you prefer send
> > to patch yourself, or me to send a v2? The condition should be
> > 
> >         WARN_ON(!((1 << newsk->sk_state) &
> >                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
> >                    TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 |
> >                    TCPF_CLOSING | TCPF_CLOSE_WAIT |
> >                    TCPF_CLOSE)));
> > 
> 
> Please send a v2.
> 
> I am not sure why we need a WARN_ON() to begin with, the socket is
> still private.

Digging into the history, the warn was introduced in 2.3.15 - was a
BUG_TRAP() back then.

The relevant chunk replaced explicit handling for each expected state
with more generic code handling all of them the same way. I guess the
assertion is a left over safeguard.

I would not drop it on net, perhaps later on net-next?

> Even the lock_sock(sk2)/release_sock(sk2) pair in inet_accept() seems overkill.

Something for net-next, I guess?

Thanks!

Paolo
Eric Dumazet May 20, 2024, 3:26 p.m. UTC | #8
On Mon, May 20, 2024 at 5:13 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2024-05-20 at 16:53 +0200, Eric Dumazet wrote:
> > On Mon, May 20, 2024 at 4:46 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > >
> > > Hi,
> > >
> > > On Mon, 2024-05-20 at 16:07 +0200, Eric Dumazet wrote:
> > > > On Mon, May 20, 2024 at 3:46 PM Eric Dumazet <edumazet@google.com> wrote:
> > > > >
> > > > > On Mon, May 20, 2024 at 12:05 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > > >
> > > > > > Christoph reported the following splat:
> > > > > >
> > > > > > WARNING: CPU: 1 PID: 772 at net/ipv4/af_inet.c:761 __inet_accept+0x1f4/0x4a0
> > > > > > Modules linked in:
> > > > > > CPU: 1 PID: 772 Comm: syz-executor510 Not tainted 6.9.0-rc7-g7da7119fe22b #56
> > > > > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.11.0-2.el7 04/01/2014
> > > > > > RIP: 0010:__inet_accept+0x1f4/0x4a0 net/ipv4/af_inet.c:759
> > > > > > Code: 04 38 84 c0 0f 85 87 00 00 00 41 c7 04 24 03 00 00 00 48 83 c4 10 5b 41 5c 41 5d 41 5e 41 5f 5d c3 cc cc cc cc e8 ec b7 da fd <0f> 0b e9 7f fe ff ff e8 e0 b7 da fd 0f 0b e9 fe fe ff ff 89 d9 80
> > > > > > RSP: 0018:ffffc90000c2fc58 EFLAGS: 00010293
> > > > > > RAX: ffffffff836bdd14 RBX: 0000000000000000 RCX: ffff888104668000
> > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> > > > > > RBP: dffffc0000000000 R08: ffffffff836bdb89 R09: fffff52000185f64
> > > > > > R10: dffffc0000000000 R11: fffff52000185f64 R12: dffffc0000000000
> > > > > > R13: 1ffff92000185f98 R14: ffff88810754d880 R15: ffff8881007b7800
> > > > > > FS:  000000001c772880(0000) GS:ffff88811b280000(0000) knlGS:0000000000000000
> > > > > > CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > > > CR2: 00007fb9fcf2e178 CR3: 00000001045d2002 CR4: 0000000000770ef0
> > > > > > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > > > > > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > > > > > PKRU: 55555554
> > > > > > Call Trace:
> > > > > >  <TASK>
> > > > > >  inet_accept+0x138/0x1d0 net/ipv4/af_inet.c:786
> > > > > >  do_accept+0x435/0x620 net/socket.c:1929
> > > > > >  __sys_accept4_file net/socket.c:1969 [inline]
> > > > > >  __sys_accept4+0x9b/0x110 net/socket.c:1999
> > > > > >  __do_sys_accept net/socket.c:2016 [inline]
> > > > > >  __se_sys_accept net/socket.c:2013 [inline]
> > > > > >  __x64_sys_accept+0x7d/0x90 net/socket.c:2013
> > > > > >  do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> > > > > >  do_syscall_64+0x58/0x100 arch/x86/entry/common.c:83
> > > > > >  entry_SYSCALL_64_after_hwframe+0x76/0x7e
> > > > > > RIP: 0033:0x4315f9
> > > > > > Code: fd ff 48 81 c4 80 00 00 00 e9 f1 fe ff ff 0f 1f 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 ab b4 fd ff c3 66 2e 0f 1f 84 00 00 00 00
> > > > > > RSP: 002b:00007ffdb26d9c78 EFLAGS: 00000246 ORIG_RAX: 000000000000002b
> > > > > > RAX: ffffffffffffffda RBX: 0000000000400300 RCX: 00000000004315f9
> > > > > > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000004
> > > > > > RBP: 00000000006e1018 R08: 0000000000400300 R09: 0000000000400300
> > > > > > R10: 0000000000400300 R11: 0000000000000246 R12: 0000000000000000
> > > > > > R13: 000000000040cdf0 R14: 000000000040ce80 R15: 0000000000000055
> > > > > >  </TASK>
> > > > > >
> > > > > > Listener sockets are supposed to have a zero sk_shutdown, as the
> > > > > > accepted children will inherit such field.
> > > > > >
> > > > > > Invoking shutdown() before entering the listener status allows
> > > > > > violating the above constraint.
> > > > > >
> > > > > > After commit 94062790aedb ("tcp: defer shutdown(SEND_SHUTDOWN) for
> > > > > > TCP_SYN_RECV sockets"), the above causes the child to reach the accept
> > > > > > syscall in FIN_WAIT1 status.
> > > > > >
> > > > > > Address the issue explicitly by clearing sk_shutdown at listen time.
> > > > > >
> > > > > > Reported-by: Christoph Paasch <cpaasch@apple.com>
> > > > > > Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/490
> > > > > > Fixes: 1da177e4c3fu ("Linux-2.6.12-rc2")
> > > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > > ---
> > > > > > Note: the issue above reports an MPTCP reproducer, but I can reproduce
> > > > > > the issue even using plain TCP sockets only.
> > > > > > ---
> > > > > >  net/ipv4/inet_connection_sock.c | 2 ++
> > > > > >  1 file changed, 2 insertions(+)
> > > > > >
> > > > > > diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
> > > > > > index 3b38610958ee..dab723fea0cc 100644
> > > > > > --- a/net/ipv4/inet_connection_sock.c
> > > > > > +++ b/net/ipv4/inet_connection_sock.c
> > > > > > @@ -1269,6 +1269,8 @@ int inet_csk_listen_start(struct sock *sk)
> > > > > >
> > > > > >         reqsk_queue_alloc(&icsk->icsk_accept_queue);
> > > > > >
> > > > > > +       /* closed sockets can have non zero sk_shutdown */
> > > > > > +       WRITE_ONCE(sk->sk_shutdown, 0);
> > > > >
> > > > > Hi Paolo.
> > > > >
> > > > > I am unsure about your patch, I had an internal syzbot report about
> > > > > this before going OOO for a few days,
> > > > > and my first reaction was to change the WARN in inet_accept().
> > > > >
> > > > > Perhaps some applications are relying on calling shutdown() before listen()...
> > >
> > > Uhmm, right I did not consider that a non zero sk_shutdown would have
> > > affected recvmsg() and sendmsg() even prior to 94062790aedb ("tcp:
> > > defer shutdown(SEND_SHUTDOWN) for TCP_SYN_RECV sockets").
> > >
> > > > BTW the syzbot repro was
> > > >
> > > > r0 = socket$inet6_tcp(0xa, 0x1, 0x0)
> > > > sendto$inet6(0xffffffffffffffff, 0x0, 0x0, 0x20000004, 0x0, 0x0)
> > > > shutdown(r0, 0x1)
> > > > bind$inet6(r0, &(0x7f0000000040)={0xa, 0x4e22, 0x0, @empty}, 0x1c)
> > > > listen(r0, 0x0)
> > > > r1 = socket$inet_mptcp(0x2, 0x1, 0x106)
> > > > connect$inet(r1, &(0x7f0000000000)={0x2, 0x4e22, @local}, 0x10)
> > > > accept(r0, 0x0, 0x0)
> > >
> > > The above is very similar to what Christoph reported. It should splat
> > > even replacing 0x106 with 0 (mptcp -> tcp).
> > >
> > > I'm fine with relaxing the check in __inet_accept(). Do you prefer send
> > > to patch yourself, or me to send a v2? The condition should be
> > >
> > >         WARN_ON(!((1 << newsk->sk_state) &
> > >                   (TCPF_ESTABLISHED | TCPF_SYN_RECV |
> > >                    TCPF_FIN_WAIT1 | TCPF_FIN_WAIT2 |
> > >                    TCPF_CLOSING | TCPF_CLOSE_WAIT |
> > >                    TCPF_CLOSE)));
> > >
> >
> > Please send a v2.
> >
> > I am not sure why we need a WARN_ON() to begin with, the socket is
> > still private.
>
> Digging into the history, the warn was introduced in 2.3.15 - was a
> BUG_TRAP() back then.
>
> The relevant chunk replaced explicit handling for each expected state
> with more generic code handling all of them the same way. I guess the
> assertion is a left over safeguard.
>
> I would not drop it on net, perhaps later on net-next?

Sure, let's wait for the next syzbot report if any.

>
> > Even the lock_sock(sk2)/release_sock(sk2) pair in inet_accept() seems overkill.
>
> Something for net-next, I guess?

Sure, this is orthogonal.

>
> Thanks!
>
> Paolo
>
diff mbox series

Patch

diff --git a/net/ipv4/inet_connection_sock.c b/net/ipv4/inet_connection_sock.c
index 3b38610958ee..dab723fea0cc 100644
--- a/net/ipv4/inet_connection_sock.c
+++ b/net/ipv4/inet_connection_sock.c
@@ -1269,6 +1269,8 @@  int inet_csk_listen_start(struct sock *sk)
 
 	reqsk_queue_alloc(&icsk->icsk_accept_queue);
 
+	/* closed sockets can have non zero sk_shutdown */
+	WRITE_ONCE(sk->sk_shutdown, 0);
 	sk->sk_ack_backlog = 0;
 	inet_csk_delack_init(sk);