diff mbox series

[net] vlan: Fix VLAN 0 memory leak

Message ID 20230728163152.682078-1-vladbu@nvidia.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] vlan: Fix VLAN 0 memory leak | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers success CCed 8 of 8 maintainers
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Vlad Buslov July 28, 2023, 4:31 p.m. UTC
The referenced commit intended to fix memleak of VLAN 0 that is implicitly
created on devices with NETIF_F_HW_VLAN_CTAG_FILTER feature. However, it
doesn't take into account that the feature can be re-set during the
netdevice lifetime which will cause memory leak if feature is disabled
during the device deletion as illustrated by [0]. Fix the leak by
unconditionally deleting VLAN 0 on NETDEV_DOWN event.

[0]:
> modprobe 8021q
> ip l set dev eth2 up
> ethtool -k eth2 | grep rx-vlan-filter
rx-vlan-filter: on
> ethtool -K eth2 rx-vlan-filter off
> ip l set dev eth2 down
> ip l set dev eth2 up
> modprobe -r mlx5_ib
> modprobe -r mlx5_core
> echo scan > /sys/kernel/debug/kmemleak
> cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888165af1c00 (size 256):
  comm "ip", pid 1847, jiffies 4294908816 (age 155.892s)
  hex dump (first 32 bytes):
    00 80 12 0c 81 88 ff ff 00 00 00 00 00 00 00 00  ................
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000081646e58>] kmalloc_trace+0x27/0xc0
    [<0000000096c47f74>] vlan_vid_add+0x444/0x750
    [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q]
    [<00000000a888adcb>] notifier_call_chain+0x97/0x240
    [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250
    [<00000000d423db72>] dev_change_flags+0xfa/0x170
    [<0000000048bc9621>] do_setlink+0x84b/0x3140
    [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550
    [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90
    [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40
    [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360
    [<000000006227c8de>] netlink_unicast+0x438/0x710
    [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70
    [<00000000768833ad>] sock_sendmsg+0xc5/0x190
    [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0
    [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170
unreferenced object 0xffff888122bb9080 (size 32):
  comm "ip", pid 1847, jiffies 4294908816 (age 155.892s)
  hex dump (first 32 bytes):
    a0 1c af 65 81 88 ff ff a0 1c af 65 81 88 ff ff  ...e.......e....
    81 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<0000000081646e58>] kmalloc_trace+0x27/0xc0
    [<00000000174174bb>] vlan_vid_add+0x4fd/0x750
    [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q]
    [<00000000a888adcb>] notifier_call_chain+0x97/0x240
    [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250
    [<00000000d423db72>] dev_change_flags+0xfa/0x170
    [<0000000048bc9621>] do_setlink+0x84b/0x3140
    [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550
    [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90
    [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40
    [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360
    [<000000006227c8de>] netlink_unicast+0x438/0x710
    [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70
    [<00000000768833ad>] sock_sendmsg+0xc5/0x190
    [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0
    [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170

Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
Signed-off-by: Vlad Buslov <vladbu@nvidia.com>
---
 net/8021q/vlan.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Ido Schimmel July 30, 2023, 3:30 p.m. UTC | #1
On Fri, Jul 28, 2023 at 06:31:52PM +0200, Vlad Buslov wrote:
> The referenced commit intended to fix memleak of VLAN 0 that is implicitly
> created on devices with NETIF_F_HW_VLAN_CTAG_FILTER feature. However, it
> doesn't take into account that the feature can be re-set during the
> netdevice lifetime which will cause memory leak if feature is disabled
> during the device deletion as illustrated by [0]. Fix the leak by
> unconditionally deleting VLAN 0 on NETDEV_DOWN event.

Specifically, what happens is:

> 
> [0]:
> > modprobe 8021q
> > ip l set dev eth2 up

VID 0 is created with reference count of 1

> > ethtool -k eth2 | grep rx-vlan-filter
> rx-vlan-filter: on
> > ethtool -K eth2 rx-vlan-filter off
> > ip l set dev eth2 down

Reference count is not dropped because the feature is off

> > ip l set dev eth2 up

Reference count is not increased because the feature is off. It could
have been increased if this line was preceded by:

ethtool -K eth2 rx-vlan-filter on

> > modprobe -r mlx5_ib
> > modprobe -r mlx5_core

Reference count is not dropped during NETDEV_DOWN because the feature is
off and NETDEV_UNREGISTER only dismantles upper VLAN devices, resulting
in VID 0 being leaked.

> > echo scan > /sys/kernel/debug/kmemleak
> > cat /sys/kernel/debug/kmemleak
> unreferenced object 0xffff888165af1c00 (size 256):
>   comm "ip", pid 1847, jiffies 4294908816 (age 155.892s)
>   hex dump (first 32 bytes):
>     00 80 12 0c 81 88 ff ff 00 00 00 00 00 00 00 00  ................
>     00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<0000000081646e58>] kmalloc_trace+0x27/0xc0
>     [<0000000096c47f74>] vlan_vid_add+0x444/0x750
>     [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q]
>     [<00000000a888adcb>] notifier_call_chain+0x97/0x240
>     [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250
>     [<00000000d423db72>] dev_change_flags+0xfa/0x170
>     [<0000000048bc9621>] do_setlink+0x84b/0x3140
>     [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550
>     [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90
>     [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40
>     [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360
>     [<000000006227c8de>] netlink_unicast+0x438/0x710
>     [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70
>     [<00000000768833ad>] sock_sendmsg+0xc5/0x190
>     [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0
>     [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170
> unreferenced object 0xffff888122bb9080 (size 32):
>   comm "ip", pid 1847, jiffies 4294908816 (age 155.892s)
>   hex dump (first 32 bytes):
>     a0 1c af 65 81 88 ff ff a0 1c af 65 81 88 ff ff  ...e.......e....
>     81 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
>   backtrace:
>     [<0000000081646e58>] kmalloc_trace+0x27/0xc0
>     [<00000000174174bb>] vlan_vid_add+0x4fd/0x750
>     [<00000000a7304a26>] vlan_device_event+0x1f1/0x1f20 [8021q]
>     [<00000000a888adcb>] notifier_call_chain+0x97/0x240
>     [<000000005a6ebbb6>] __dev_notify_flags+0xe2/0x250
>     [<00000000d423db72>] dev_change_flags+0xfa/0x170
>     [<0000000048bc9621>] do_setlink+0x84b/0x3140
>     [<0000000087d26a73>] __rtnl_newlink+0x954/0x1550
>     [<00000000f767fdc2>] rtnl_newlink+0x5f/0x90
>     [<0000000093aed008>] rtnetlink_rcv_msg+0x336/0xa40
>     [<000000008d83ca71>] netlink_rcv_skb+0x12c/0x360
>     [<000000006227c8de>] netlink_unicast+0x438/0x710
>     [<00000000957f18cf>] netlink_sendmsg+0x7a0/0xc70
>     [<00000000768833ad>] sock_sendmsg+0xc5/0x190
>     [<0000000048d43666>] ____sys_sendmsg+0x534/0x6b0
>     [<00000000bd83c8d6>] ___sys_sendmsg+0xeb/0x170
> 
> Fixes: efc73f4bbc23 ("net: Fix memory leak - vlan_info struct")
> Signed-off-by: Vlad Buslov <vladbu@nvidia.com>

Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Simon Horman July 31, 2023, 9:52 a.m. UTC | #2
On Sun, Jul 30, 2023 at 06:30:15PM +0300, Ido Schimmel wrote:
> On Fri, Jul 28, 2023 at 06:31:52PM +0200, Vlad Buslov wrote:
> > The referenced commit intended to fix memleak of VLAN 0 that is implicitly
> > created on devices with NETIF_F_HW_VLAN_CTAG_FILTER feature. However, it
> > doesn't take into account that the feature can be re-set during the
> > netdevice lifetime which will cause memory leak if feature is disabled
> > during the device deletion as illustrated by [0]. Fix the leak by
> > unconditionally deleting VLAN 0 on NETDEV_DOWN event.
> 
> Specifically, what happens is:
> 
> > 
> > [0]:
> > > modprobe 8021q
> > > ip l set dev eth2 up
> 
> VID 0 is created with reference count of 1
> 
> > > ethtool -k eth2 | grep rx-vlan-filter
> > rx-vlan-filter: on
> > > ethtool -K eth2 rx-vlan-filter off
> > > ip l set dev eth2 down
> 
> Reference count is not dropped because the feature is off
> 
> > > ip l set dev eth2 up
> 
> Reference count is not increased because the feature is off. It could
> have been increased if this line was preceded by:
> 
> ethtool -K eth2 rx-vlan-filter on
> 
> > > modprobe -r mlx5_ib
> > > modprobe -r mlx5_core
> 
> Reference count is not dropped during NETDEV_DOWN because the feature is
> off and NETDEV_UNREGISTER only dismantles upper VLAN devices, resulting
> in VID 0 being leaked.

Thanks Ido and Vlad,

perhaps it would be worth including the information added
by Ido above in the patch description. Not a hard requirement
from my side, just an idea.
Ido Schimmel July 31, 2023, 3:45 p.m. UTC | #3
On Mon, Jul 31, 2023 at 11:52:19AM +0200, Simon Horman wrote:
> perhaps it would be worth including the information added
> by Ido above in the patch description. Not a hard requirement
> from my side, just an idea.

I agree (assuming my analysis is correct).
Vlad Buslov July 31, 2023, 7:11 p.m. UTC | #4
On Mon 31 Jul 2023 at 18:45, Ido Schimmel <idosch@idosch.org> wrote:
> On Mon, Jul 31, 2023 at 11:52:19AM +0200, Simon Horman wrote:
>> perhaps it would be worth including the information added
>> by Ido above in the patch description. Not a hard requirement
>> from my side, just an idea.
>
> I agree (assuming my analysis is correct).

Sorry for making it more convoluted than necessary. Just disabling HW
vlan feature on the device and removing the module is enough for repro:

# modprobe 8021q
# ip l set dev eth2 up
# ethtool -K eth2 rx-vlan-filter off
# modprobe -r mlx5_ib
# modprobe -r mlx5_core
# cat /sys/kernel/debug/kmemleak
unreferenced object 0xffff888103dcd900 (size 256):
  comm "ip", pid 1490, jiffies 4294907305 (age 325.364s)
  hex dump (first 32 bytes):
    00 80 5d 03 81 88 ff ff 00 00 00 00 00 00 00 00  ..].............
    00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000899f3bb9>] kmalloc_trace+0x25/0x80
    [<000000002889a7a2>] vlan_vid_add+0xa0/0x210
    [<000000007177800e>] vlan_device_event+0x374/0x760 [8021q]
    [<000000009a0716b1>] notifier_call_chain+0x35/0xb0
    [<00000000bbf3d162>] __dev_notify_flags+0x58/0xf0
    [<0000000053d2b05d>] dev_change_flags+0x4d/0x60
    [<00000000982807e9>] do_setlink+0x28d/0x10a0
    [<0000000058c1be00>] __rtnl_newlink+0x545/0x980
    [<00000000e66c3bd9>] rtnl_newlink+0x44/0x70
    [<00000000a2cc5970>] rtnetlink_rcv_msg+0x29c/0x390
    [<00000000d307d1e4>] netlink_rcv_skb+0x54/0x100
    [<00000000259d16f9>] netlink_unicast+0x1f6/0x2c0
    [<000000007ce2afa1>] netlink_sendmsg+0x232/0x4a0
    [<00000000f3f4bb39>] sock_sendmsg+0x38/0x60
    [<000000002f9c0624>] ____sys_sendmsg+0x1e3/0x200
    [<00000000d6ff5520>] ___sys_sendmsg+0x80/0xc0
unreferenced object 0xffff88813354fde0 (size 32):
  comm "ip", pid 1490, jiffies 4294907305 (age 325.364s)
  hex dump (first 32 bytes):
    a0 d9 dc 03 81 88 ff ff a0 d9 dc 03 81 88 ff ff  ................
    81 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00  ................
  backtrace:
    [<00000000899f3bb9>] kmalloc_trace+0x25/0x80
    [<000000002da64724>] vlan_vid_add+0xdf/0x210
    [<000000007177800e>] vlan_device_event+0x374/0x760 [8021q]
    [<000000009a0716b1>] notifier_call_chain+0x35/0xb0
    [<00000000bbf3d162>] __dev_notify_flags+0x58/0xf0
    [<0000000053d2b05d>] dev_change_flags+0x4d/0x60
    [<00000000982807e9>] do_setlink+0x28d/0x10a0
    [<0000000058c1be00>] __rtnl_newlink+0x545/0x980
    [<00000000e66c3bd9>] rtnl_newlink+0x44/0x70
    [<00000000a2cc5970>] rtnetlink_rcv_msg+0x29c/0x390
    [<00000000d307d1e4>] netlink_rcv_skb+0x54/0x100
    [<00000000259d16f9>] netlink_unicast+0x1f6/0x2c0
    [<000000007ce2afa1>] netlink_sendmsg+0x232/0x4a0
    [<00000000f3f4bb39>] sock_sendmsg+0x38/0x60
    [<000000002f9c0624>] ____sys_sendmsg+0x1e3/0x200
    [<00000000d6ff5520>] ___sys_sendmsg+0x80/0xc0
diff mbox series

Patch

diff --git a/net/8021q/vlan.c b/net/8021q/vlan.c
index e40aa3e3641c..b3662119ddbc 100644
--- a/net/8021q/vlan.c
+++ b/net/8021q/vlan.c
@@ -384,8 +384,7 @@  static int vlan_device_event(struct notifier_block *unused, unsigned long event,
 			dev->name);
 		vlan_vid_add(dev, htons(ETH_P_8021Q), 0);
 	}
-	if (event == NETDEV_DOWN &&
-	    (dev->features & NETIF_F_HW_VLAN_CTAG_FILTER))
+	if (event == NETDEV_DOWN)
 		vlan_vid_del(dev, htons(ETH_P_8021Q), 0);
 
 	vlan_info = rtnl_dereference(dev->vlan_info);