Message ID | 20230728163152.682078-1-vladbu@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] vlan: Fix VLAN 0 memory leak | expand |
On Fri, Jul 28, 2023 at 06:31:52PM +0200, Vlad Buslov wrote: > The referenced commit intended to fix memleak of VLAN 0 that is implicitly > created on devices with NETIF_F_HW_VLAN_CTAG_FILTER feature. However, it > doesn't take into account that the feature can be re-set during the > netdevice lifetime which will cause memory leak if feature is disabled > during the device deletion as illustrated by [0]. Fix the leak by > unconditionally deleting VLAN 0 on NETDEV_DOWN event. Specifically, what happens is: > > [0]: > > modprobe 8021q > > ip l set dev eth2 up VID 0 is created with reference count of 1 > > ethtool -k eth2 | grep rx-vlan-filter > rx-vlan-filter: on > > ethtool -K eth2 rx-vlan-filter off > > ip l set dev eth2 down Reference count is not dropped because the feature is off > > ip l set dev eth2 up Reference count is not increased because the feature is off. It could have been increased if this line was preceded by: ethtool -K eth2 rx-vlan-filter on > > modprobe -r mlx5_ib > > modprobe -r mlx5_core Reference count is not dropped during NETDEV_DOWN because the feature is off and NETDEV_UNREGISTER only dismantles upper VLAN devices, resulting in VID 0 being leaked. > > echo scan > /sys/kernel/debug/kmemleak > > cat /sys/kernel/debug/kmemleak > unreferenced object 0xffff888165af1c00 (size 256): > comm "ip", pid 1847, jiffies 4294908816 (age 155.892s) > hex dump (first 32 bytes): > 00 80 12 0c 81 88 ff ff 00 00 00 00 00 00 00 00 ................ > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<0000000081646e58>] kmalloc_trace+0x27/0xc0 > [<0000000096c47f74>] vlan_vid_add+0x444/0x750 > [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q] > [<00000000a888adcb>] notifier_call_chain+0x97/0x240 > [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250 > [<00000000d423db72>] dev_change_flags+0xfa/0x170 > [<0000000048bc9621>] do_setlink+0x84b/0x3140 > [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550 > [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90 > [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40 > [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360 > [<000000006227c8de>] netlink_unicast+0x438/0x710 > [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70 > [<00000000768833ad>] sock_sendmsg+0xc5/0x190 > [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0 > [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170 > unreferenced object 0xffff888122bb9080 (size 32): > comm "ip", pid 1847, jiffies 4294908816 (age 155.892s) > hex dump (first 32 bytes): > a0 1c af 65 81 88 ff ff a0 1c af 65 81 88 ff ff ...e.......e.... > 81 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<0000000081646e58>] kmalloc_trace+0x27/0xc0 > [<00000000174174bb>] vlan_vid_add+0x4fd/0x750 > [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q] > [<00000000a888adcb>] notifier_call_chain+0x97/0x240 > [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250 > [<00000000d423db72>] dev_change_flags+0xfa/0x170 > [<0000000048bc9621>] do_setlink+0x84b/0x3140 > [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550 > [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90 > [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40 > [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360 > [<000000006227c8de>] netlink_unicast+0x438/0x710 > [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70 > [<00000000768833ad>] sock_sendmsg+0xc5/0x190 > [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0 > [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170 > > Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct") > Signed-off-by: Vlad Buslov <vladbu@nvidia.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com>
On Sun, Jul 30, 2023 at 06:30:15PM +0300, Ido Schimmel wrote: > On Fri, Jul 28, 2023 at 06:31:52PM +0200, Vlad Buslov wrote: > > The referenced commit intended to fix memleak of VLAN 0 that is implicitly > > created on devices with NETIF_F_HW_VLAN_CTAG_FILTER feature. However, it > > doesn't take into account that the feature can be re-set during the > > netdevice lifetime which will cause memory leak if feature is disabled > > during the device deletion as illustrated by [0]. Fix the leak by > > unconditionally deleting VLAN 0 on NETDEV_DOWN event. > > Specifically, what happens is: > > > > > [0]: > > > modprobe 8021q > > > ip l set dev eth2 up > > VID 0 is created with reference count of 1 > > > > ethtool -k eth2 | grep rx-vlan-filter > > rx-vlan-filter: on > > > ethtool -K eth2 rx-vlan-filter off > > > ip l set dev eth2 down > > Reference count is not dropped because the feature is off > > > > ip l set dev eth2 up > > Reference count is not increased because the feature is off. It could > have been increased if this line was preceded by: > > ethtool -K eth2 rx-vlan-filter on > > > > modprobe -r mlx5_ib > > > modprobe -r mlx5_core > > Reference count is not dropped during NETDEV_DOWN because the feature is > off and NETDEV_UNREGISTER only dismantles upper VLAN devices, resulting > in VID 0 being leaked. Thanks Ido and Vlad, perhaps it would be worth including the information added by Ido above in the patch description. Not a hard requirement from my side, just an idea.
On Mon, Jul 31, 2023 at 11:52:19AM +0200, Simon Horman wrote: > perhaps it would be worth including the information added > by Ido above in the patch description. Not a hard requirement > from my side, just an idea. I agree (assuming my analysis is correct).
On Mon 31 Jul 2023 at 18:45, Ido Schimmel <idosch@idosch.org> wrote: > On Mon, Jul 31, 2023 at 11:52:19AM +0200, Simon Horman wrote: >> perhaps it would be worth including the information added >> by Ido above in the patch description. Not a hard requirement >> from my side, just an idea. > > I agree (assuming my analysis is correct). Sorry for making it more convoluted than necessary. Just disabling HW vlan feature on the device and removing the module is enough for repro: # modprobe 8021q # ip l set dev eth2 up # ethtool -K eth2 rx-vlan-filter off # modprobe -r mlx5_ib # modprobe -r mlx5_core # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff888103dcd900 (size 256): comm "ip", pid 1490, jiffies 4294907305 (age 325.364s) hex dump (first 32 bytes): 00 80 5d 03 81 88 ff ff 00 00 00 00 00 00 00 00 ..]............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000899f3bb9>] kmalloc_trace+0x25/0x80 [<000000002889a7a2>] vlan_vid_add+0xa0/0x210 [<000000007177800e>] vlan_device_event+0x374/0x760 [8021q] [<000000009a0716b1>] notifier_call_chain+0x35/0xb0 [<00000000bbf3d162>] __dev_notify_flags+0x58/0xf0 [<0000000053d2b05d>] dev_change_flags+0x4d/0x60 [<00000000982807e9>] do_setlink+0x28d/0x10a0 [<0000000058c1be00>] __rtnl_newlink+0x545/0x980 [<00000000e66c3bd9>] rtnl_newlink+0x44/0x70 [<00000000a2cc5970>] rtnetlink_rcv_msg+0x29c/0x390 [<00000000d307d1e4>] netlink_rcv_skb+0x54/0x100 [<00000000259d16f9>] netlink_unicast+0x1f6/0x2c0 [<000000007ce2afa1>] netlink_sendmsg+0x232/0x4a0 [<00000000f3f4bb39>] sock_sendmsg+0x38/0x60 [<000000002f9c0624>] ____sys_sendmsg+0x1e3/0x200 [<00000000d6ff5520>] ___sys_sendmsg+0x80/0xc0 unreferenced object 0xffff88813354fde0 (size 32): comm "ip", pid 1490, jiffies 4294907305 (age 325.364s) hex dump (first 32 bytes): a0 d9 dc 03 81 88 ff ff a0 d9 dc 03 81 88 ff ff ................ 81 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000899f3bb9>] kmalloc_trace+0x25/0x80 [<000000002da64724>] vlan_vid_add+0xdf/0x210 [<000000007177800e>] vlan_device_event+0x374/0x760 [8021q] [<000000009a0716b1>] notifier_call_chain+0x35/0xb0 [<00000000bbf3d162>] __dev_notify_flags+0x58/0xf0 [<0000000053d2b05d>] dev_change_flags+0x4d/0x60 [<00000000982807e9>] do_setlink+0x28d/0x10a0 [<0000000058c1be00>] __rtnl_newlink+0x545/0x980 [<00000000e66c3bd9>] rtnl_newlink+0x44/0x70 [<00000000a2cc5970>] rtnetlink_rcv_msg+0x29c/0x390 [<00000000d307d1e4>] netlink_rcv_skb+0x54/0x100 [<00000000259d16f9>] netlink_unicast+0x1f6/0x2c0 [<000000007ce2afa1>] netlink_sendmsg+0x232/0x4a0 [<00000000f3f4bb39>] sock_sendmsg+0x38/0x60 [<000000002f9c0624>] ____sys_sendmsg+0x1e3/0x200 [<00000000d6ff5520>] ___sys_sendmsg+0x80/0xc0
diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c index e40aa3e3641c..b3662119ddbc 100644 --- a/net/8021q/vlan.c +++ b/net/8021q/vlan.c @@ -384,8 +384,7 @@ static int vlan_device_event(struct notifier_block *unused, unsigned long event, dev->name); vlan_vid_add(dev, htons(ETH_P_8021Q), 0); } - if (event == NETDEV_DOWN && - (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER)) + if (event == NETDEV_DOWN) vlan_vid_del(dev, htons(ETH_P_8021Q), 0); vlan_info = rtnl_dereference(dev->vlan_info);