Message ID | 20240725-tcp-ao-static-branch-rcu-v1-1-021d009beebf@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/tcp: Disable TCP-AO static key after RCU grace period | expand |
On Thu, Jul 25, 2024 at 7:00 AM Dmitry Safonov via B4 Relay <devnull+0x7f454c46.gmail.com@kernel.org> wrote: > > From: Dmitry Safonov <0x7f454c46@gmail.com> > > The lifetime of TCP-AO static_key is the same as the last > tcp_ao_info. On the socket destruction tcp_ao_info ceases to be > with RCU grace period, while tcp-ao static branch is currently deferred > destructed. The static key definition is > : DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ); > > which means that if RCU grace period is delayed by more than a second > and tcp_ao_needed is in the process of disablement, other CPUs may > yet see tcp_ao_info which atent dead, but soon-to-be. > And that breaks the assumption of static_key_fast_inc_not_disabled(). I am afraid I do not understand this changelog at all. What is "the assumption of static_key_fast_inc_not_disabled()" you are referring to ? I think it would help to provide more details. > > Happened on netdev test-bot[1], so not a theoretical issue: > > [] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1 > [] ------------[ cut here ]------------ > [] kernel BUG at arch/x86/kernel/jump_label.c:73! > [] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > [] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1 > [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > [] Workqueue: events jump_label_update_timeout > [] RIP: 0010:__jump_label_patch+0x2f6/0x350 > ... > [] Call Trace: > [] <TASK> > [] arch_jump_label_transform_queue+0x6c/0x110 > [] __jump_label_update+0xef/0x350 > [] __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60 > [] jump_label_update_timeout+0x2c/0x40 > [] process_one_work+0xe3b/0x1670 > [] worker_thread+0x587/0xce0 > [] kthread+0x28a/0x350 > [] ret_from_fork+0x31/0x70 > [] ret_from_fork_asm+0x1a/0x30 > [] </TASK> > [] Modules linked in: veth > [] ---[ end trace 0000000000000000 ]--- > [] RIP: 0010:__jump_label_patch+0x2f6/0x350 > > [1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr > > Cc: stable@kernel.org > Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO") > Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> > --- > --- > net/ipv4/tcp_ao.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c > index 85531437890c..5ce914d3e3db 100644 > --- a/net/ipv4/tcp_ao.c > +++ b/net/ipv4/tcp_ao.c > @@ -267,6 +267,14 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head) > kfree_sensitive(key); > } > > +static void tcp_ao_info_free_rcu(struct rcu_head *head) > +{ > + struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu); > + > + kfree(ao); > + static_branch_slow_dec_deferred(&tcp_ao_needed); > +} > + > void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > { > struct tcp_ao_info *ao; > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); > call_rcu(&key->rcu, tcp_ao_key_free_rcu); > } > - > - kfree_rcu(ao, rcu); > - static_branch_slow_dec_deferred(&tcp_ao_needed); > + call_rcu(&ao->rcu, tcp_ao_info_free_rcu); > } > > void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp) > > --- > base-commit: c33ffdb70cc6df4105160f991288e7d2567d7ffa > change-id: 20240725-tcp-ao-static-branch-rcu-85ede7b3a1a5 > > Best regards, > -- > Dmitry Safonov <0x7f454c46@gmail.com> > >
Hi Eric, thanks for looking into this, On Thu, 25 Jul 2024 at 08:48, Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Jul 25, 2024 at 7:00 AM Dmitry Safonov via B4 Relay > <devnull+0x7f454c46.gmail.com@kernel.org> wrote: > > > > From: Dmitry Safonov <0x7f454c46@gmail.com> > > > > The lifetime of TCP-AO static_key is the same as the last > > tcp_ao_info. On the socket destruction tcp_ao_info ceases to be > > with RCU grace period, while tcp-ao static branch is currently deferred > > destructed. The static key definition is > > : DEFINE_STATIC_KEY_DEFERRED_FALSE(tcp_ao_needed, HZ); > > > > which means that if RCU grace period is delayed by more than a second > > and tcp_ao_needed is in the process of disablement, other CPUs may > > yet see tcp_ao_info which atent dead, but soon-to-be. > > > And that breaks the assumption of static_key_fast_inc_not_disabled(). > > I am afraid I do not understand this changelog at all. > > What is "the assumption of static_key_fast_inc_not_disabled()" you > are referring to ? > > I think it would help to provide more details. Sorry, my bad, I'm referring here to the comment/description for the function: * static_key_fast_inc_not_disabled - adds a user for a static key * @key: static key that must be already enabled * * The caller must make sure that the static key can't get disabled while * in this function. It doesn't patch jump labels, only adds a user to * an already enabled static key. Originally it was introduced in commit eb8c507296f6 ("jump_label: Prevent key->enabled int overflow"), which is needed for the atomic contexts, one of which would be the creation of a full socket from a request socket. In that atomic context, we know by the presence of the key (md5/ao) that the static branch is already enabled. So, we can just increment the ref counter for that static branch instead of holding the proper mutex. static_key_fast_inc_not_disabled() is just a helper for that usage case. But it must not be used if the static branch could get disabled in parallel as it's not protected by jump_label_mutex and as a result, races with jump_label_update() implementation details. Specifically, from the log in [1], I see that jump_label_type() wrongly tells arch_jump_label_transform_queue() to enable the static_brach, when the caller was, in fact, __static_key_slow_dec_cpuslocked() - requesting to disable that. And then, the x86-specific code produces: : jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1 when it tries to enable the static key; but the op-code is not no-op, it's 2-byte jump. The reason for that of course is that intended operation was to disable the branch, but it has raced with this increment helper. Hopefully, that clarifies somewhat the situation here. Thankfully, for TCP-MD5 I did a better job: tcp_md5sig_info_free_rcu() and tcp_md5_twsk_free_rcu() are RCU callbacks. Also, please note that I intentionally call static_branch_slow_dec_deferred() variant in the RCU callback, rather than synchronous. The reason for that: * When the control is directly exposed to userspace, it is prudent to delay the * decrement to avoid high frequency code modifications which can (and do) * cause significant performance degradation. Struct static_key_deferred and * static_key_slow_dec_deferred() provide for this. > > > > > Happened on netdev test-bot[1], so not a theoretical issue: > > > > [] jump_label: Fatal kernel bug, unexpected op at tcp_inbound_hash+0x1a7/0x870 [ffffffffa8c4e9b7] (eb 50 0f 1f 44 != 66 90 0f 1f 00)) size:2 type:1 > > [] ------------[ cut here ]------------ > > [] kernel BUG at arch/x86/kernel/jump_label.c:73! > > [] Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN NOPTI > > [] CPU: 3 PID: 243 Comm: kworker/3:3 Not tainted 6.10.0-virtme #1 > > [] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 > > [] Workqueue: events jump_label_update_timeout > > [] RIP: 0010:__jump_label_patch+0x2f6/0x350 > > ... > > [] Call Trace: > > [] <TASK> > > [] arch_jump_label_transform_queue+0x6c/0x110 > > [] __jump_label_update+0xef/0x350 > > [] __static_key_slow_dec_cpuslocked.part.0+0x3c/0x60 > > [] jump_label_update_timeout+0x2c/0x40 > > [] process_one_work+0xe3b/0x1670 > > [] worker_thread+0x587/0xce0 > > [] kthread+0x28a/0x350 > > [] ret_from_fork+0x31/0x70 > > [] ret_from_fork_asm+0x1a/0x30 > > [] </TASK> > > [] Modules linked in: veth > > [] ---[ end trace 0000000000000000 ]--- > > [] RIP: 0010:__jump_label_patch+0x2f6/0x350 > > > > [1]: https://netdev-3.bots.linux.dev/vmksft-tcp-ao-dbg/results/696681/5-connect-deny-ipv6/stderr > > > > Cc: stable@kernel.org > > Fixes: 67fa83f7c86a ("net/tcp: Add static_key for TCP-AO") > > Signed-off-by: Dmitry Safonov <0x7f454c46@gmail.com> > > --- > > --- > > net/ipv4/tcp_ao.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c > > index 85531437890c..5ce914d3e3db 100644 > > --- a/net/ipv4/tcp_ao.c > > +++ b/net/ipv4/tcp_ao.c > > @@ -267,6 +267,14 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head) > > kfree_sensitive(key); > > } > > > > +static void tcp_ao_info_free_rcu(struct rcu_head *head) > > +{ > > + struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu); > > + > > + kfree(ao); > > + static_branch_slow_dec_deferred(&tcp_ao_needed); > > +} > > + > > void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > > { > > struct tcp_ao_info *ao; > > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > > atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); > > call_rcu(&key->rcu, tcp_ao_key_free_rcu); > > } > > - > > - kfree_rcu(ao, rcu); > > - static_branch_slow_dec_deferred(&tcp_ao_needed); > > + call_rcu(&ao->rcu, tcp_ao_info_free_rcu); > > } > > > > void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp) > > > > --- > > base-commit: c33ffdb70cc6df4105160f991288e7d2567d7ffa > > change-id: 20240725-tcp-ao-static-branch-rcu-85ede7b3a1a5 > > > > Best regards, > > -- > > Dmitry Safonov <0x7f454c46@gmail.com> > > > > Thanks, Dmitry
On Thu, 25 Jul 2024 06:00:02 +0100 Dmitry Safonov via B4 Relay wrote: > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); > call_rcu(&key->rcu, tcp_ao_key_free_rcu); > } > - > - kfree_rcu(ao, rcu); > - static_branch_slow_dec_deferred(&tcp_ao_needed); > + call_rcu(&ao->rcu, tcp_ao_info_free_rcu); Maybe free the keys inside tcp_ao_info_free_rcu, too? IIUC you're saying that new sock is still looking at this ao under RCU protection - messing with the key list feels a tiny bit odd since the object is technically "live" until the end of the RCU grace period.
Hi Jakub, On Sat, 27 Jul 2024 at 03:34, Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 25 Jul 2024 06:00:02 +0100 Dmitry Safonov via B4 Relay wrote: > > @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk) > > atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); > > call_rcu(&key->rcu, tcp_ao_key_free_rcu); > > } > > - > > - kfree_rcu(ao, rcu); > > - static_branch_slow_dec_deferred(&tcp_ao_needed); > > + call_rcu(&ao->rcu, tcp_ao_info_free_rcu); > > Maybe free the keys inside tcp_ao_info_free_rcu, too? > IIUC you're saying that new sock is still looking at this ao under RCU > protection - messing with the key list feels a tiny bit odd since the > object is technically "live" until the end of the RCU grace period. Yeah, I think that's possible. Looking at it now, I think it also needs : rcu_assign_pointer(tp->ao_info, NULL) above, rather than a plain null-assign. Will fix and send v2. Thanks, Dmitry
diff --git a/net/ipv4/tcp_ao.c b/net/ipv4/tcp_ao.c index 85531437890c..5ce914d3e3db 100644 --- a/net/ipv4/tcp_ao.c +++ b/net/ipv4/tcp_ao.c @@ -267,6 +267,14 @@ static void tcp_ao_key_free_rcu(struct rcu_head *head) kfree_sensitive(key); } +static void tcp_ao_info_free_rcu(struct rcu_head *head) +{ + struct tcp_ao_info *ao = container_of(head, struct tcp_ao_info, rcu); + + kfree(ao); + static_branch_slow_dec_deferred(&tcp_ao_needed); +} + void tcp_ao_destroy_sock(struct sock *sk, bool twsk) { struct tcp_ao_info *ao; @@ -290,9 +298,7 @@ void tcp_ao_destroy_sock(struct sock *sk, bool twsk) atomic_sub(tcp_ao_sizeof_key(key), &sk->sk_omem_alloc); call_rcu(&key->rcu, tcp_ao_key_free_rcu); } - - kfree_rcu(ao, rcu); - static_branch_slow_dec_deferred(&tcp_ao_needed); + call_rcu(&ao->rcu, tcp_ao_info_free_rcu); } void tcp_ao_time_wait(struct tcp_timewait_sock *tcptw, struct tcp_sock *tp)