Message ID | 20240227011041.97375-5-kuniyu@amazon.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | tcp/rds: Fix use-after-free around kernel TCP reqsk. | expand |
On Tue, Feb 27, 2024 at 2:12 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > syzkaller reported a warning of netns tracker [0] followed by KASAN > splat [1] and another ref tracker warning [1]. > > syzkaller could not find a repro, but in the log, the only suspicious > sequence was as follows: > > 18:26:22 executing program 1: > r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) > ... > connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) > > The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. > > So, the scenario would be: > > 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in > rds_tcp_listen_init(). > 2. syz-executor connect()s to it and creates a reqsk. > 3. syz-executor exit()s immediately. > 4. netns is dismantled. [0] > 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] > 6. listener is freed after RCU grace period. [2] > > Basically, reqsk assumes that the listener guarantees netns safety > until all reqsk timers are expired by holding the listener's refcount. > However, this was not the case for kernel sockets. > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > inet_twsk_purge()") fixed this issue only for per-netns ehash, but > the issue still exists for the global ehash. > > We can apply the same fix, but this issue is specific to RDS. > > Instead of iterating ehash and purging reqsk during netns dismantle, > let's hold netns refcount for the kernel listener. > > > Reported-by: syzkaller <syzkaller@googlegroups.com> > Suggested-by: Eric Dumazet <edumazet@google.com> > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > --- > net/rds/tcp_listen.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c > index 05008ce5c421..2d40e523322c 100644 > --- a/net/rds/tcp_listen.c > +++ b/net/rds/tcp_listen.c > @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) > int addr_len; > int ret; > > - ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > - IPPROTO_TCP, &sock); > + ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > + IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF); > if (ret < 0) { > rdsdebug("could not create %s listener socket: %d\n", > isv6 ? "IPv6" : "IPv4", ret); If RDS module keeps a listener alive, not attached to a user process, netns dismantle will never occur. I think we have to cleanup SYN_RECV sockets in inet_twsk_purge() Yes, it removes one optimization you did. Perhaps add a counter of all kernel sockets that were ever attached to a netns in order to decide to apply the optimization. (keeping a precise count of SYN_RECV would be too expensive)
From: Eric Dumazet <edumazet@google.com> Date: Tue, 27 Feb 2024 13:06:07 +0100 > On Tue, Feb 27, 2024 at 2:12 AM Kuniyuki Iwashima <kuniyu@amazon.com> wrote: > > > > syzkaller reported a warning of netns tracker [0] followed by KASAN > > splat [1] and another ref tracker warning [1]. > > > > syzkaller could not find a repro, but in the log, the only suspicious > > sequence was as follows: > > > > 18:26:22 executing program 1: > > r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) > > ... > > connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) > > > > The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. > > > > So, the scenario would be: > > > > 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in > > rds_tcp_listen_init(). > > 2. syz-executor connect()s to it and creates a reqsk. > > 3. syz-executor exit()s immediately. > > 4. netns is dismantled. [0] > > 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] > > 6. listener is freed after RCU grace period. [2] > > > > Basically, reqsk assumes that the listener guarantees netns safety > > until all reqsk timers are expired by holding the listener's refcount. > > However, this was not the case for kernel sockets. > > > > Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in > > inet_twsk_purge()") fixed this issue only for per-netns ehash, but > > the issue still exists for the global ehash. > > > > We can apply the same fix, but this issue is specific to RDS. > > > > Instead of iterating ehash and purging reqsk during netns dismantle, > > let's hold netns refcount for the kernel listener. > > > > > > > Reported-by: syzkaller <syzkaller@googlegroups.com> > > Suggested-by: Eric Dumazet <edumazet@google.com> > > Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") > > Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> > > --- > > net/rds/tcp_listen.c | 4 ++-- > > 1 file changed, 2 insertions(+), 2 deletions(-) > > > > diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c > > index 05008ce5c421..2d40e523322c 100644 > > --- a/net/rds/tcp_listen.c > > +++ b/net/rds/tcp_listen.c > > @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) > > int addr_len; > > int ret; > > > > - ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > > - IPPROTO_TCP, &sock); > > + ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, > > + IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF); > > if (ret < 0) { > > rdsdebug("could not create %s listener socket: %d\n", > > isv6 ? "IPv6" : "IPv4", ret); > > If RDS module keeps a listener alive, not attached to a user process, > netns dismantle will never occur. > > I think we have to cleanup SYN_RECV sockets in inet_twsk_purge() Ah.. yes, __init_net ops hook must not take net ref.. I'll go that way in v3. > > Yes, it removes one optimization you did. > > Perhaps add a counter of all kernel sockets that were ever attached to > a netns in order to decide to apply the optimization. > (keeping a precise count of SYN_RECV would be too expensive) I'll work on the follow-up for net-next after the right fix is merged. Thanks!
diff --git a/net/rds/tcp_listen.c b/net/rds/tcp_listen.c index 05008ce5c421..2d40e523322c 100644 --- a/net/rds/tcp_listen.c +++ b/net/rds/tcp_listen.c @@ -274,8 +274,8 @@ struct socket *rds_tcp_listen_init(struct net *net, bool isv6) int addr_len; int ret; - ret = sock_create_kern(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, - IPPROTO_TCP, &sock); + ret = __sock_create(net, isv6 ? PF_INET6 : PF_INET, SOCK_STREAM, + IPPROTO_TCP, &sock, SOCKET_KERN_NET_REF); if (ret < 0) { rdsdebug("could not create %s listener socket: %d\n", isv6 ? "IPv6" : "IPv4", ret);
syzkaller reported a warning of netns tracker [0] followed by KASAN splat [1] and another ref tracker warning [1]. syzkaller could not find a repro, but in the log, the only suspicious sequence was as follows: 18:26:22 executing program 1: r0 = socket$inet6_mptcp(0xa, 0x1, 0x106) ... connect$inet6(r0, &(0x7f0000000080)={0xa, 0x4001, 0x0, @loopback}, 0x1c) (async) The notable thing here is 0x4001 in connect(), which is RDS_TCP_PORT. So, the scenario would be: 1. unshare(CLONE_NEWNET) creates a per netns tcp listener in rds_tcp_listen_init(). 2. syz-executor connect()s to it and creates a reqsk. 3. syz-executor exit()s immediately. 4. netns is dismantled. [0] 5. reqsk timer is fired, and UAF happens while freeing reqsk. [1] 6. listener is freed after RCU grace period. [2] Basically, reqsk assumes that the listener guarantees netns safety until all reqsk timers are expired by holding the listener's refcount. However, this was not the case for kernel sockets. Commit 740ea3c4a0b2 ("tcp: Clean up kernel listener's reqsk in inet_twsk_purge()") fixed this issue only for per-netns ehash, but the issue still exists for the global ehash. We can apply the same fix, but this issue is specific to RDS. Instead of iterating ehash and purging reqsk during netns dismantle, let's hold netns refcount for the kernel listener. [0]: ref_tracker: net notrefcnt@0000000065449cc3 has 1/1 users at sk_alloc (./include/net/net_namespace.h:337 net/core/sock.c:2146) inet6_create (net/ipv6/af_inet6.c:192 net/ipv6/af_inet6.c:119) __sock_create (net/socket.c:1572) rds_tcp_listen_init (net/rds/tcp_listen.c:279) rds_tcp_init_net (net/rds/tcp.c:577) ops_init (net/core/net_namespace.c:137) setup_net (net/core/net_namespace.c:340) copy_net_ns (net/core/net_namespace.c:497) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) ... WARNING: CPU: 0 PID: 27 at lib/ref_tracker.c:179 ref_tracker_dir_exit (lib/ref_tracker.c:179) [1]: BUG: KASAN: slab-use-after-free in inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) Read of size 8 at addr ffff88801b370400 by task swapper/0/0 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 Call Trace: <IRQ> dump_stack_lvl (lib/dump_stack.c:107 (discriminator 1)) print_report (mm/kasan/report.c:378 mm/kasan/report.c:488) kasan_report (mm/kasan/report.c:603) inet_csk_reqsk_queue_drop (./include/net/inet_hashtables.h:180 net/ipv4/inet_connection_sock.c:952 net/ipv4/inet_connection_sock.c:966) reqsk_timer_handler (net/ipv4/inet_connection_sock.c:979 net/ipv4/inet_connection_sock.c:1092) call_timer_fn (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/timer.h:127 kernel/time/timer.c:1701) __run_timers.part.0 (kernel/time/timer.c:1752 kernel/time/timer.c:2038) run_timer_softirq (kernel/time/timer.c:2053) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) </IRQ> Allocated by task 258 on cpu 0 at 83.612050s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) __kasan_slab_alloc (mm/kasan/common.c:343) kmem_cache_alloc (mm/slub.c:3813 mm/slub.c:3860 mm/slub.c:3867) copy_net_ns (./include/linux/slab.h:701 net/core/net_namespace.c:421 net/core/net_namespace.c:480) create_new_namespaces (kernel/nsproxy.c:110) unshare_nsproxy_namespaces (kernel/nsproxy.c:228 (discriminator 4)) ksys_unshare (kernel/fork.c:3429) __x64_sys_unshare (kernel/fork.c:3496) do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83) entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:129) Freed by task 27 on cpu 0 at 329.158864s: kasan_save_stack (mm/kasan/common.c:48) kasan_save_track (mm/kasan/common.c:68) kasan_save_free_info (mm/kasan/generic.c:643) __kasan_slab_free (mm/kasan/common.c:265) kmem_cache_free (mm/slub.c:4299 mm/slub.c:4363) cleanup_net (net/core/net_namespace.c:456 net/core/net_namespace.c:446 net/core/net_namespace.c:639) process_one_work (kernel/workqueue.c:2638) worker_thread (kernel/workqueue.c:2700 kernel/workqueue.c:2787) kthread (kernel/kthread.c:388) ret_from_fork (arch/x86/kernel/process.c:153) ret_from_fork_asm (arch/x86/entry/entry_64.S:250) The buggy address belongs to the object at ffff88801b370000 which belongs to the cache net_namespace of size 4352 The buggy address is located 1024 bytes inside of freed 4352-byte region [ffff88801b370000, ffff88801b371100) [2]: WARNING: CPU: 0 PID: 95 at lib/ref_tracker.c:228 ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) Modules linked in: Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 RIP: 0010:ref_tracker_free (lib/ref_tracker.c:228 (discriminator 1)) ... Call Trace: <IRQ> __sk_destruct (./include/net/net_namespace.h:353 net/core/sock.c:2204) rcu_core (./arch/x86/include/asm/preempt.h:26 kernel/rcu/tree.c:2165 kernel/rcu/tree.c:2433) __do_softirq (./arch/x86/include/asm/jump_label.h:27 ./include/linux/jump_label.h:207 ./include/trace/events/irq.h:142 kernel/softirq.c:554) irq_exit_rcu (kernel/softirq.c:427 kernel/softirq.c:632 kernel/softirq.c:644) sysvec_apic_timer_interrupt (arch/x86/kernel/apic/apic.c:1076 (discriminator 14)) </IRQ> Reported-by: syzkaller <syzkaller@googlegroups.com> Suggested-by: Eric Dumazet <edumazet@google.com> Fixes: 467fa15356ac ("RDS-TCP: Support multiple RDS-TCP listen endpoints, one per netns.") Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> --- net/rds/tcp_listen.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)