mbox series

[net-next,0/3] dst_cache: cope with device removal

Message ID cover.1717087015.git.pabeni@redhat.com (mailing list archive)
Headers show
Series dst_cache: cope with device removal | expand

Message

Paolo Abeni May 30, 2024, 5:18 p.m. UTC
Eric reported a net device refcount leak and diagnosed the root cause
as the dst_cache not coping well with the underlying device removal.

To address such issue, this series introduces the infrastructure to let
the existing uncached list handle the relevant cleanup.

Patch 1 and 2 are preparation changes to make the uncached list infra
more flexible for the new use-case, and patch 3 addresses the issue.

---
Targeting net-next as the addressed problem is quite ancient and I fear
some unexpected side effects for patch 2.

Paolo Abeni (3):
  ipv6: use a new flag to indicate elevated refcount.
  ipv4: obsolete routes moved out of per cpu cache
  dst_cache: let rt_uncached cope with dst_cache cleanup

 include/net/ip6_fib.h | 3 +++
 net/core/dst_cache.c  | 8 ++++++++
 net/ipv4/route.c      | 2 +-
 net/ipv6/route.c      | 4 ++--
 4 files changed, 14 insertions(+), 3 deletions(-)

Comments

Eric Dumazet May 30, 2024, 5:38 p.m. UTC | #1
On Thu, May 30, 2024 at 7:21 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Eric reported a net device refcount leak and diagnosed the root cause
> as the dst_cache not coping well with the underlying device removal.
>
> To address such issue, this series introduces the infrastructure to let
> the existing uncached list handle the relevant cleanup.
>
> Patch 1 and 2 are preparation changes to make the uncached list infra
> more flexible for the new use-case, and patch 3 addresses the issue.
>
> ---
> Targeting net-next as the addressed problem is quite ancient and I fear
> some unexpected side effects for patch 2.

Thanks Paolo, I am going to test this ASAP.

BTW I found suspect dst_cache uses from lwtunnel in the output path.
AFAIK lwtunnel_output() does not block BH.
Either we change lwtunnel_output() or replace some of the ->output
methods to use local_bh_disable() ?

If BH is already held, I do not think
preempt_disable()/preempt_enable(); pairs are necessary.

diff --git a/net/ipv6/ioam6_iptunnel.c b/net/ipv6/ioam6_iptunnel.c
index 7563f8c6aa87cf9f7841ee78dcea2a16f60ac344..bf7120ecea1ebe834e70073710be0c1692d7ad1d
100644
--- a/net/ipv6/ioam6_iptunnel.c
+++ b/net/ipv6/ioam6_iptunnel.c
@@ -351,9 +351,9 @@ static int ioam6_output(struct net *net, struct
sock *sk, struct sk_buff *skb)
                goto drop;

        if (!ipv6_addr_equal(&orig_daddr, &ipv6_hdr(skb)->daddr)) {
-               preempt_disable();
+               local_bh_disable();
                dst = dst_cache_get(&ilwt->cache);
-               preempt_enable();
+               local_bh_enable();

                if (unlikely(!dst)) {
                        struct ipv6hdr *hdr = ipv6_hdr(skb);
@@ -373,9 +373,9 @@ static int ioam6_output(struct net *net, struct
sock *sk, struct sk_buff *skb)
                                goto drop;
                        }

-                       preempt_disable();
+                       local_bh_disable();
                        dst_cache_set_ip6(&ilwt->cache, dst, &fl6.saddr);
-                       preempt_enable();
+                       local_bh_enable();
                }

                skb_dst_drop(skb);