Message ID | 20240423205408.39632-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c4e86b4363ac8ecf836def8e46973707688fd98f |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] net: add two more call_rcu_hurry() | expand |
On 4/23/24 2:54 PM, Eric Dumazet wrote: > I had failures with pmtu.sh selftests lately, > with netns dismantles firing ref_tracking alerts [1]. > > After much debugging, I found that some queued > rcu callbacks were delayed by minutes, because > of CONFIG_RCU_LAZY=y option. > > Joel Fernandes had a similar issue in the past, > fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() > for dst_release()") > > In this commit, I make sure nexthop_free_rcu() > and free_fib_info_rcu() are not delayed too much > because they both can release device references. > > tools/testing/selftests/net/pmtu.sh no longer fails. > > Traces were: > > [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 3/5 users at > dst_alloc+0x76/0x160 > ip6_dst_alloc+0x25/0x80 > ip6_pol_route+0x2a8/0x450 > ip6_pol_route_output+0x1f/0x30 > fib6_rule_lookup+0x163/0x270 > ip6_route_output_flags+0xda/0x190 > ip6_dst_lookup_tail.constprop.0+0x1d0/0x260 > ip6_dst_lookup_flow+0x47/0xa0 > udp_tunnel6_dst_lookup+0x158/0x210 > vxlan_xmit_one+0x4c2/0x1550 [vxlan] > vxlan_xmit+0x52d/0x14f0 [vxlan] > dev_hard_start_xmit+0x7b/0x1e0 > __dev_queue_xmit+0x20b/0xe40 > ip6_finish_output2+0x2ea/0x6e0 > ip6_finish_output+0x143/0x320 > ip6_output+0x74/0x140 > > [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at > netdev_get_by_index+0xc0/0xe0 > fib6_nh_init+0x1a9/0xa90 > rtm_new_nexthop+0x6fa/0x1580 > rtnetlink_rcv_msg+0x155/0x3e0 > netlink_rcv_skb+0x61/0x110 > rtnetlink_rcv+0x19/0x20 > netlink_unicast+0x23f/0x380 > netlink_sendmsg+0x1fc/0x430 > ____sys_sendmsg+0x2ef/0x320 > ___sys_sendmsg+0x86/0xd0 > __sys_sendmsg+0x67/0xc0 > __x64_sys_sendmsg+0x21/0x30 > x64_sys_call+0x252/0x2030 > do_syscall_64+0x6c/0x190 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at > ipv6_add_dev+0x136/0x530 > addrconf_notify+0x19d/0x770 > notifier_call_chain+0x65/0xd0 > raw_notifier_call_chain+0x1a/0x20 > call_netdevice_notifiers_info+0x54/0x90 > register_netdevice+0x61e/0x790 > veth_newlink+0x230/0x440 > __rtnl_newlink+0x7d2/0xaa0 > rtnl_newlink+0x4c/0x70 > rtnetlink_rcv_msg+0x155/0x3e0 > netlink_rcv_skb+0x61/0x110 > rtnetlink_rcv+0x19/0x20 > netlink_unicast+0x23f/0x380 > netlink_sendmsg+0x1fc/0x430 > ____sys_sendmsg+0x2ef/0x320 > ___sys_sendmsg+0x86/0xd0 > .... > [ 1079.316024] ? show_regs+0x68/0x80 > [ 1079.316087] ? __warn+0x8c/0x140 > [ 1079.316103] ? ref_tracker_free+0x1a0/0x270 > [ 1079.316117] ? report_bug+0x196/0x1c0 > [ 1079.316135] ? handle_bug+0x42/0x80 > [ 1079.316149] ? exc_invalid_op+0x1c/0x70 > [ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30 > [ 1079.316193] ? ref_tracker_free+0x1a0/0x270 > [ 1079.316208] ? _raw_spin_unlock+0x1a/0x40 > [ 1079.316222] ? free_unref_page+0x126/0x1a0 > [ 1079.316239] ? destroy_large_folio+0x69/0x90 > [ 1079.316251] ? __folio_put+0x99/0xd0 > [ 1079.316276] dst_dev_put+0x69/0xd0 > [ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80 > [ 1079.316327] fib6_nh_release+0x45/0x70 > [ 1079.316340] nexthop_free_rcu+0x131/0x170 > [ 1079.316356] rcu_do_batch+0x1ee/0x820 > [ 1079.316370] ? rcu_do_batch+0x179/0x820 > [ 1079.316388] rcu_core+0x1aa/0x4d0 > [ 1079.316405] rcu_core_si+0x12/0x20 > [ 1079.316417] __do_softirq+0x13a/0x3dc > [ 1079.316435] __irq_exit_rcu+0xa3/0x110 > [ 1079.316449] irq_exit_rcu+0x12/0x30 > [ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0 > [ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30 > [ 1079.316569] RIP: 0033:0x7f06b65c63f0 > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Joel Fernandes (Google) <joel@joelfernandes.org> > Cc: Paul E. McKenney <paulmck@kernel.org> > --- > include/net/nexthop.h | 2 +- > net/ipv4/fib_semantics.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > Reviewed-by: David Ahern <dsahern@kernel.org>
On Tue, 2024-04-23 at 20:54 +0000, Eric Dumazet wrote: > I had failures with pmtu.sh selftests lately, > with netns dismantles firing ref_tracking alerts [1]. > > After much debugging, I found that some queued > rcu callbacks were delayed by minutes, because > of CONFIG_RCU_LAZY=y option. > > Joel Fernandes had a similar issue in the past, > fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() > for dst_release()") > > In this commit, I make sure nexthop_free_rcu() > and free_fib_info_rcu() are not delayed too much > because they both can release device references. Great debugging! I'm wondering how many other similar situations we have out there??? Have you considered instead adding a synchronize_rcu() alongside the rcu_barrier() in netdev_wait_allrefs_any()? If I read correctly commit 483c26ff63f4, That should kick all the possibly pending lazy rcu operation. The patch LGTM, I'm just "thinking aloud". Thanks, Paolo
Hi Paolo On Wed, Apr 24, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2024-04-23 at 20:54 +0000, Eric Dumazet wrote: > > I had failures with pmtu.sh selftests lately, > > with netns dismantles firing ref_tracking alerts [1]. > > > > After much debugging, I found that some queued > > rcu callbacks were delayed by minutes, because > > of CONFIG_RCU_LAZY=y option. > > > > Joel Fernandes had a similar issue in the past, > > fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() > > for dst_release()") > > > > In this commit, I make sure nexthop_free_rcu() > > and free_fib_info_rcu() are not delayed too much > > because they both can release device references. > > Great debugging! > > I'm wondering how many other similar situations we have out there??? I think there is another candidate for inet_free_ifa() diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c index 7592f242336b7fdf67e79dbd75407cf03e841cfc..cd2f0af7240899795abff0087730db2bb755c36e 100644 --- a/net/ipv4/devinet.c +++ b/net/ipv4/devinet.c @@ -231,7 +231,7 @@ static void inet_rcu_free_ifa(struct rcu_head *head) static void inet_free_ifa(struct in_ifaddr *ifa) { - call_rcu(&ifa->rcu_head, inet_rcu_free_ifa); + call_rcu_hurry(&ifa->rcu_head, inet_rcu_free_ifa); } > > Have you considered instead adding a synchronize_rcu() alongside the > rcu_barrier() in netdev_wait_allrefs_any()? If I read correctly commit > 483c26ff63f4, That should kick all the possibly pending lazy rcu > operation. synchronize_rcu() could return very fast, even if queued rcu items are still lingering. I tried the following patch, this does not help. Were you thinking of something else ? diff --git a/net/core/dev.c b/net/core/dev.c index 8bdc59074b29c287e432c73fffbe93c63d539ad2..a727290011693081b13ac6065e2ff2810bb5739d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10559,6 +10559,7 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list) rebroadcast_time = jiffies; } + synchronize_rcu_expedited(); if (!wait) { rcu_barrier(); wait = WAIT_REFS_MIN_MSECS; > > The patch LGTM, I'm just "thinking aloud". > > Thanks, > > Paolo >
On Wed, Apr 24, 2024 at 3:01 PM Eric Dumazet <edumazet@google.com> wrote: > > Hi Paolo > > On Wed, Apr 24, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Tue, 2024-04-23 at 20:54 +0000, Eric Dumazet wrote: > > > I had failures with pmtu.sh selftests lately, > > > with netns dismantles firing ref_tracking alerts [1]. > > > > > > After much debugging, I found that some queued > > > rcu callbacks were delayed by minutes, because > > > of CONFIG_RCU_LAZY=y option. > > > > > > Joel Fernandes had a similar issue in the past, > > > fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() > > > for dst_release()") > > > > > > In this commit, I make sure nexthop_free_rcu() > > > and free_fib_info_rcu() are not delayed too much > > > because they both can release device references. > > > > Great debugging! > > > > I'm wondering how many other similar situations we have out there??? > > I think there is another candidate for inet_free_ifa() > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 7592f242336b7fdf67e79dbd75407cf03e841cfc..cd2f0af7240899795abff0087730db2bb755c36e > 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -231,7 +231,7 @@ static void inet_rcu_free_ifa(struct rcu_head *head) > > static void inet_free_ifa(struct in_ifaddr *ifa) > { > - call_rcu(&ifa->rcu_head, inet_rcu_free_ifa); > + call_rcu_hurry(&ifa->rcu_head, inet_rcu_free_ifa); > } > > > > > > Have you considered instead adding a synchronize_rcu() alongside the > > rcu_barrier() in netdev_wait_allrefs_any()? If I read correctly commit > > 483c26ff63f4, That should kick all the possibly pending lazy rcu > > operation. > > synchronize_rcu() could return very fast, even if queued rcu items are > still lingering. > > I tried the following patch, this does not help. > > Were you thinking of something else ? > > diff --git a/net/core/dev.c b/net/core/dev.c > index 8bdc59074b29c287e432c73fffbe93c63d539ad2..a727290011693081b13ac6065e2ff2810bb5739d > 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -10559,6 +10559,7 @@ static struct net_device > *netdev_wait_allrefs_any(struct list_head *list) > rebroadcast_time = jiffies; > } > > + synchronize_rcu_expedited(); > if (!wait) { > rcu_barrier(); > wait = WAIT_REFS_MIN_MSECS; > > Note that if we are in this part of the code, we are already in some trouble, so the call_rcu_hurry() would avoid this. I will try :: diff --git a/net/core/dev.c b/net/core/dev.c index 8bdc59074b29c287e432c73fffbe93c63d539ad2..b4aa5b7070897edbe8a26fbb51d23e85a7a09dc4 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10559,8 +10559,8 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list) rebroadcast_time = jiffies; } + rcu_barrier(); if (!wait) { - rcu_barrier(); wait = WAIT_REFS_MIN_MSECS; } else { msleep(wait);
On Wed, 2024-04-24 at 15:01 +0200, Eric Dumazet wrote: > Hi Paolo > > On Wed, Apr 24, 2024 at 10:01 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Tue, 2024-04-23 at 20:54 +0000, Eric Dumazet wrote: > > > I had failures with pmtu.sh selftests lately, > > > with netns dismantles firing ref_tracking alerts [1]. > > > > > > After much debugging, I found that some queued > > > rcu callbacks were delayed by minutes, because > > > of CONFIG_RCU_LAZY=y option. > > > > > > Joel Fernandes had a similar issue in the past, > > > fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() > > > for dst_release()") > > > > > > In this commit, I make sure nexthop_free_rcu() > > > and free_fib_info_rcu() are not delayed too much > > > because they both can release device references. > > > > Great debugging! > > > > I'm wondering how many other similar situations we have out there??? > > I think there is another candidate for inet_free_ifa() > > diff --git a/net/ipv4/devinet.c b/net/ipv4/devinet.c > index 7592f242336b7fdf67e79dbd75407cf03e841cfc..cd2f0af7240899795abff0087730db2bb755c36e > 100644 > --- a/net/ipv4/devinet.c > +++ b/net/ipv4/devinet.c > @@ -231,7 +231,7 @@ static void inet_rcu_free_ifa(struct rcu_head *head) > > static void inet_free_ifa(struct in_ifaddr *ifa) > { > - call_rcu(&ifa->rcu_head, inet_rcu_free_ifa); > + call_rcu_hurry(&ifa->rcu_head, inet_rcu_free_ifa); > } > > > > > > Have you considered instead adding a synchronize_rcu() alongside the > > rcu_barrier() in netdev_wait_allrefs_any()? If I read correctly commit > > 483c26ff63f4, That should kick all the possibly pending lazy rcu > > operation. > > synchronize_rcu() could return very fast, even if queued rcu items are > still lingering. > > I tried the following patch, this does not help. > > Were you thinking of something else ? I'm not sure if the _expedited() variant implies the call_rcu_hurry() needed to flush the pending lazy callbacks. My expectation was "yes", but quickly skimming over the code hints otherwise. Additionally, re-reading the 483c26ff63f4 changelog, I now think synchornize_rcu() will kick the pending lazy callback only on the current CPU, which should not be enough here. I guess your original patch is the only option. Thanks for the additional investigation, Paolo
Hello: This patch was applied to netdev/net-next.git (main) by Jakub Kicinski <kuba@kernel.org>: On Tue, 23 Apr 2024 20:54:08 +0000 you wrote: > I had failures with pmtu.sh selftests lately, > with netns dismantles firing ref_tracking alerts [1]. > > After much debugging, I found that some queued > rcu callbacks were delayed by minutes, because > of CONFIG_RCU_LAZY=y option. > > [...] Here is the summary with links: - [net-next] net: add two more call_rcu_hurry() https://git.kernel.org/netdev/net-next/c/c4e86b4363ac You are awesome, thank you!
diff --git a/include/net/nexthop.h b/include/net/nexthop.h index 7ca315ad500e7f4e8135a964e1c0cf48f0139436..68463aebcc05928468ca776bf207a73b39e836ae 100644 --- a/include/net/nexthop.h +++ b/include/net/nexthop.h @@ -267,7 +267,7 @@ static inline bool nexthop_get(struct nexthop *nh) static inline void nexthop_put(struct nexthop *nh) { if (refcount_dec_and_test(&nh->refcnt)) - call_rcu(&nh->rcu, nexthop_free_rcu); + call_rcu_hurry(&nh->rcu, nexthop_free_rcu); } static inline bool nexthop_cmp(const struct nexthop *nh1, diff --git a/net/ipv4/fib_semantics.c b/net/ipv4/fib_semantics.c index 5eb1b8d302bbd1c408c4999636b6cf4b81a0ad7e..f669da98d11d8f2a1182e6e84d98c2b92f774a88 100644 --- a/net/ipv4/fib_semantics.c +++ b/net/ipv4/fib_semantics.c @@ -254,7 +254,7 @@ void free_fib_info(struct fib_info *fi) return; } - call_rcu(&fi->rcu, free_fib_info_rcu); + call_rcu_hurry(&fi->rcu, free_fib_info_rcu); } EXPORT_SYMBOL_GPL(free_fib_info);
I had failures with pmtu.sh selftests lately, with netns dismantles firing ref_tracking alerts [1]. After much debugging, I found that some queued rcu callbacks were delayed by minutes, because of CONFIG_RCU_LAZY=y option. Joel Fernandes had a similar issue in the past, fixed with commit 483c26ff63f4 ("net: Use call_rcu_hurry() for dst_release()") In this commit, I make sure nexthop_free_rcu() and free_fib_info_rcu() are not delayed too much because they both can release device references. tools/testing/selftests/net/pmtu.sh no longer fails. Traces were: [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 3/5 users at dst_alloc+0x76/0x160 ip6_dst_alloc+0x25/0x80 ip6_pol_route+0x2a8/0x450 ip6_pol_route_output+0x1f/0x30 fib6_rule_lookup+0x163/0x270 ip6_route_output_flags+0xda/0x190 ip6_dst_lookup_tail.constprop.0+0x1d0/0x260 ip6_dst_lookup_flow+0x47/0xa0 udp_tunnel6_dst_lookup+0x158/0x210 vxlan_xmit_one+0x4c2/0x1550 [vxlan] vxlan_xmit+0x52d/0x14f0 [vxlan] dev_hard_start_xmit+0x7b/0x1e0 __dev_queue_xmit+0x20b/0xe40 ip6_finish_output2+0x2ea/0x6e0 ip6_finish_output+0x143/0x320 ip6_output+0x74/0x140 [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at netdev_get_by_index+0xc0/0xe0 fib6_nh_init+0x1a9/0xa90 rtm_new_nexthop+0x6fa/0x1580 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 __sys_sendmsg+0x67/0xc0 __x64_sys_sendmsg+0x21/0x30 x64_sys_call+0x252/0x2030 do_syscall_64+0x6c/0x190 entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 968.179860] ref_tracker: veth_A-R1@00000000d0ff3fe2 has 1/5 users at ipv6_add_dev+0x136/0x530 addrconf_notify+0x19d/0x770 notifier_call_chain+0x65/0xd0 raw_notifier_call_chain+0x1a/0x20 call_netdevice_notifiers_info+0x54/0x90 register_netdevice+0x61e/0x790 veth_newlink+0x230/0x440 __rtnl_newlink+0x7d2/0xaa0 rtnl_newlink+0x4c/0x70 rtnetlink_rcv_msg+0x155/0x3e0 netlink_rcv_skb+0x61/0x110 rtnetlink_rcv+0x19/0x20 netlink_unicast+0x23f/0x380 netlink_sendmsg+0x1fc/0x430 ____sys_sendmsg+0x2ef/0x320 ___sys_sendmsg+0x86/0xd0 .... [ 1079.316024] ? show_regs+0x68/0x80 [ 1079.316087] ? __warn+0x8c/0x140 [ 1079.316103] ? ref_tracker_free+0x1a0/0x270 [ 1079.316117] ? report_bug+0x196/0x1c0 [ 1079.316135] ? handle_bug+0x42/0x80 [ 1079.316149] ? exc_invalid_op+0x1c/0x70 [ 1079.316162] ? asm_exc_invalid_op+0x1f/0x30 [ 1079.316193] ? ref_tracker_free+0x1a0/0x270 [ 1079.316208] ? _raw_spin_unlock+0x1a/0x40 [ 1079.316222] ? free_unref_page+0x126/0x1a0 [ 1079.316239] ? destroy_large_folio+0x69/0x90 [ 1079.316251] ? __folio_put+0x99/0xd0 [ 1079.316276] dst_dev_put+0x69/0xd0 [ 1079.316308] fib6_nh_release_dsts.part.0+0x3d/0x80 [ 1079.316327] fib6_nh_release+0x45/0x70 [ 1079.316340] nexthop_free_rcu+0x131/0x170 [ 1079.316356] rcu_do_batch+0x1ee/0x820 [ 1079.316370] ? rcu_do_batch+0x179/0x820 [ 1079.316388] rcu_core+0x1aa/0x4d0 [ 1079.316405] rcu_core_si+0x12/0x20 [ 1079.316417] __do_softirq+0x13a/0x3dc [ 1079.316435] __irq_exit_rcu+0xa3/0x110 [ 1079.316449] irq_exit_rcu+0x12/0x30 [ 1079.316462] sysvec_apic_timer_interrupt+0x5b/0xe0 [ 1079.316474] asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 1079.316569] RIP: 0033:0x7f06b65c63f0 Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Joel Fernandes (Google) <joel@joelfernandes.org> Cc: Paul E. McKenney <paulmck@kernel.org> --- include/net/nexthop.h | 2 +- net/ipv4/fib_semantics.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)