Message ID | 20240506015439.108739-1-guwen@linux.alibaba.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net/smc: fix netdev refcnt leak in smc_ib_find_route() | expand |
On 2024-05-05 18:54, Wen Gu wrote: > A netdev refcnt leak issue was found when unregistering netdev after > using SMC. It can be reproduced as follows. > > - run tests based on SMC. > - unregister the net device. > > The following error message can be observed. > > 'unregister_netdevice: waiting for ethx to become free. Usage count = x' > > With CONFIG_NET_DEV_REFCNT_TRACKER set, more detailed error message can > be provided by refcount tracker: > > unregister_netdevice: waiting for eth1 to become free. Usage count = 2 > ref_tracker: eth%d@ffff9cabc3bf8548 has 1/1 users at > ___neigh_create+0x8e/0x420 > neigh_event_ns+0x52/0xc0 > arp_process+0x7c0/0x860 > __netif_receive_skb_list_core+0x258/0x2c0 > __netif_receive_skb_list+0xea/0x150 > netif_receive_skb_list_internal+0xf2/0x1b0 > napi_complete_done+0x73/0x1b0 > mlx5e_napi_poll+0x161/0x5e0 [mlx5_core] > __napi_poll+0x2c/0x1c0 > net_rx_action+0x2a7/0x380 > __do_softirq+0xcd/0x2a7 > > It is because in smc_ib_find_route(), neigh_lookup() takes a netdev > refcnt but does not release. So fix it. > > Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> > --- > net/smc/smc_ib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index 97704a9e84c7..b431bd8a5172 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -210,10 +210,11 @@ int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > goto out; > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > goto out; > - neigh = rt->dst.ops->neigh_lookup(&rt->dst, NULL, &fl4.daddr); > + neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); Of the two implementations of neigh_lookup() I found that do not simply return NULL, all of them increment or init struct neighbour::refcnt. 1. ipv4_neigh_lookup() 2. ip6_dst_neigh_lookup() a. __ipv6_neigh_lookup() b. neigh_create() > if (neigh) { > memcpy(nexthop_mac, neigh->ha, ETH_ALEN); > *uses_gateway = rt->rt_uses_gateway; > + neigh_release(neigh); So releasing it here looks correct. > return 0; > } > out:
On 2024-05-06 at 07:24:39, Wen Gu (guwen@linux.alibaba.com) wrote: > A netdev refcnt leak issue was found when unregistering netdev after > using SMC. It can be reproduced as follows. > > - run tests based on SMC. > - unregister the net device. > > The following error message can be observed. > > 'unregister_netdevice: waiting for ethx to become free. Usage count = x' > > With CONFIG_NET_DEV_REFCNT_TRACKER set, more detailed error message can > be provided by refcount tracker: > > unregister_netdevice: waiting for eth1 to become free. Usage count = 2 > ref_tracker: eth%d@ffff9cabc3bf8548 has 1/1 users at > ___neigh_create+0x8e/0x420 > neigh_event_ns+0x52/0xc0 > arp_process+0x7c0/0x860 > __netif_receive_skb_list_core+0x258/0x2c0 > __netif_receive_skb_list+0xea/0x150 > netif_receive_skb_list_internal+0xf2/0x1b0 > napi_complete_done+0x73/0x1b0 > mlx5e_napi_poll+0x161/0x5e0 [mlx5_core] > __napi_poll+0x2c/0x1c0 > net_rx_action+0x2a7/0x380 > __do_softirq+0xcd/0x2a7 > > It is because in smc_ib_find_route(), neigh_lookup() takes a netdev > refcnt but does not release. So fix it. > > Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> > --- > net/smc/smc_ib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index 97704a9e84c7..b431bd8a5172 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -210,10 +210,11 @@ int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > goto out; > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) need to release it here as well ? > goto out; > - neigh = rt->dst.ops->neigh_lookup(&rt->dst, NULL, &fl4.daddr); > + neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); > if (neigh) { > memcpy(nexthop_mac, neigh->ha, ETH_ALEN); > *uses_gateway = rt->rt_uses_gateway; > + neigh_release(neigh); > return 0; > } > out: > -- > 2.32.0.3.g01195cf9f >
On 2024/5/6 13:51, Ratheesh Kannoth wrote: > On 2024-05-06 at 07:24:39, Wen Gu (guwen@linux.alibaba.com) wrote: >> A netdev refcnt leak issue was found when unregistering netdev after >> using SMC. It can be reproduced as follows. >> >> - run tests based on SMC. >> - unregister the net device. >> >> The following error message can be observed. >> >> 'unregister_netdevice: waiting for ethx to become free. Usage count = x' >> >> With CONFIG_NET_DEV_REFCNT_TRACKER set, more detailed error message can >> be provided by refcount tracker: >> >> unregister_netdevice: waiting for eth1 to become free. Usage count = 2 >> ref_tracker: eth%d@ffff9cabc3bf8548 has 1/1 users at >> ___neigh_create+0x8e/0x420 >> neigh_event_ns+0x52/0xc0 >> arp_process+0x7c0/0x860 >> __netif_receive_skb_list_core+0x258/0x2c0 >> __netif_receive_skb_list+0xea/0x150 >> netif_receive_skb_list_internal+0xf2/0x1b0 >> napi_complete_done+0x73/0x1b0 >> mlx5e_napi_poll+0x161/0x5e0 [mlx5_core] >> __napi_poll+0x2c/0x1c0 >> net_rx_action+0x2a7/0x380 >> __do_softirq+0xcd/0x2a7 >> >> It is because in smc_ib_find_route(), neigh_lookup() takes a netdev >> refcnt but does not release. So fix it. >> >> Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") >> Signed-off-by: Wen Gu <guwen@linux.alibaba.com> >> --- >> net/smc/smc_ib.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c >> index 97704a9e84c7..b431bd8a5172 100644 >> --- a/net/smc/smc_ib.c >> +++ b/net/smc/smc_ib.c >> @@ -210,10 +210,11 @@ int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, >> goto out; >> if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > need to release it here as well ? > Do you mean call ip_rt_put() to release rt? Yes, after investigating here, I agree that rt needs to be released as well. Thanks! >> goto out; >> - neigh = rt->dst.ops->neigh_lookup(&rt->dst, NULL, &fl4.daddr); >> + neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); >> if (neigh) { >> memcpy(nexthop_mac, neigh->ha, ETH_ALEN); >> *uses_gateway = rt->rt_uses_gateway; >> + neigh_release(neigh); >> return 0; >> } >> out: >> -- >> 2.32.0.3.g01195cf9f >>
On 06.05.24 03:54, Wen Gu wrote: > A netdev refcnt leak issue was found when unregistering netdev after > using SMC. It can be reproduced as follows. > > - run tests based on SMC. > - unregister the net device. > > The following error message can be observed. > > 'unregister_netdevice: waiting for ethx to become free. Usage count = x' > > With CONFIG_NET_DEV_REFCNT_TRACKER set, more detailed error message can > be provided by refcount tracker: > > unregister_netdevice: waiting for eth1 to become free. Usage count = 2 > ref_tracker: eth%d@ffff9cabc3bf8548 has 1/1 users at > ___neigh_create+0x8e/0x420 > neigh_event_ns+0x52/0xc0 > arp_process+0x7c0/0x860 > __netif_receive_skb_list_core+0x258/0x2c0 > __netif_receive_skb_list+0xea/0x150 > netif_receive_skb_list_internal+0xf2/0x1b0 > napi_complete_done+0x73/0x1b0 > mlx5e_napi_poll+0x161/0x5e0 [mlx5_core] > __napi_poll+0x2c/0x1c0 > net_rx_action+0x2a7/0x380 > __do_softirq+0xcd/0x2a7 > > It is because in smc_ib_find_route(), neigh_lookup() takes a netdev > refcnt but does not release. So fix it. > > Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") > Signed-off-by: Wen Gu <guwen@linux.alibaba.com> > --- > net/smc/smc_ib.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c > index 97704a9e84c7..b431bd8a5172 100644 > --- a/net/smc/smc_ib.c > +++ b/net/smc/smc_ib.c > @@ -210,10 +210,11 @@ int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, > goto out; > if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) > goto out; > - neigh = rt->dst.ops->neigh_lookup(&rt->dst, NULL, &fl4.daddr); > + neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); > if (neigh) { > memcpy(nexthop_mac, neigh->ha, ETH_ALEN); > *uses_gateway = rt->rt_uses_gateway; > + neigh_release(neigh); > return 0; > } > out: Hi Wen, Thanks for fixing that! It looks good to me and works well. Please release rt for that condition in the next version. (Thx, @Ratheesh!) Thanks, Wenjia
diff --git a/net/smc/smc_ib.c b/net/smc/smc_ib.c index 97704a9e84c7..b431bd8a5172 100644 --- a/net/smc/smc_ib.c +++ b/net/smc/smc_ib.c @@ -210,10 +210,11 @@ int smc_ib_find_route(struct net *net, __be32 saddr, __be32 daddr, goto out; if (rt->rt_uses_gateway && rt->rt_gw_family != AF_INET) goto out; - neigh = rt->dst.ops->neigh_lookup(&rt->dst, NULL, &fl4.daddr); + neigh = dst_neigh_lookup(&rt->dst, &fl4.daddr); if (neigh) { memcpy(nexthop_mac, neigh->ha, ETH_ALEN); *uses_gateway = rt->rt_uses_gateway; + neigh_release(neigh); return 0; } out:
A netdev refcnt leak issue was found when unregistering netdev after using SMC. It can be reproduced as follows. - run tests based on SMC. - unregister the net device. The following error message can be observed. 'unregister_netdevice: waiting for ethx to become free. Usage count = x' With CONFIG_NET_DEV_REFCNT_TRACKER set, more detailed error message can be provided by refcount tracker: unregister_netdevice: waiting for eth1 to become free. Usage count = 2 ref_tracker: eth%d@ffff9cabc3bf8548 has 1/1 users at ___neigh_create+0x8e/0x420 neigh_event_ns+0x52/0xc0 arp_process+0x7c0/0x860 __netif_receive_skb_list_core+0x258/0x2c0 __netif_receive_skb_list+0xea/0x150 netif_receive_skb_list_internal+0xf2/0x1b0 napi_complete_done+0x73/0x1b0 mlx5e_napi_poll+0x161/0x5e0 [mlx5_core] __napi_poll+0x2c/0x1c0 net_rx_action+0x2a7/0x380 __do_softirq+0xcd/0x2a7 It is because in smc_ib_find_route(), neigh_lookup() takes a netdev refcnt but does not release. So fix it. Fixes: e5c4744cfb59 ("net/smc: add SMC-Rv2 connection establishment") Signed-off-by: Wen Gu <guwen@linux.alibaba.com> --- net/smc/smc_ib.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)