Message ID | 20230801095943.3650567-1-william.xuanziyang@huawei.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] vlan: Fix to delete vid only when by_dev has hw filter capable in vlan_vids_del_by_dev() | expand |
On Tue, Aug 01, 2023 at 05:59:43PM +0800, Ziyang Xuan wrote: > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c > index 0beb44f2fe1f..79cf4f033b66 100644 > --- a/net/8021q/vlan_core.c > +++ b/net/8021q/vlan_core.c > @@ -436,8 +436,11 @@ void vlan_vids_del_by_dev(struct net_device *dev, > if (!vlan_info) > return; > > - list_for_each_entry(vid_info, &vlan_info->vid_list, list) > + list_for_each_entry(vid_info, &vlan_info->vid_list, list) { > + if (!vlan_hw_filter_capable(by_dev, vid_info->proto)) > + continue; > vlan_vid_del(dev, vid_info->proto, vid_info->vid); vlan_vids_add_by_dev() does not have this check which means that memory will leak [1] if the device is enslaved after the bond already has a VLAN upper. I believe the correct solution is to explicitly set the STAG-related features [3] in the bond driver in a similar fashion to how it's done for the CTAG features. That way the bond driver will always propagate the VLAN info to its slaves. Please check the team driver as well. I think it's suffering from the same problem. If so, please fix it in a separate patch. [1] unreferenced object 0xffff888103efbd00 (size 256): comm "ip", pid 351, jiffies 4294763177 (age 19.697s) hex dump (first 32 bytes): 00 10 7a 11 81 88 ff ff 00 00 00 00 00 00 00 00 ..z............. 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<ffffffff81a88c0a>] kmalloc_trace+0x2a/0xe0 [<ffffffff840f349c>] vlan_vid_add+0x30c/0x790 [<ffffffff840f4a68>] vlan_vids_add_by_dev+0x148/0x390 [<ffffffff82eea5e8>] bond_enslave+0xaf8/0x5d50 [<ffffffff837fbfd1>] do_set_master+0x1c1/0x220 [<ffffffff8380711c>] do_setlink+0xa0c/0x3fa0 [<ffffffff8381ee09>] __rtnl_newlink+0xc09/0x18c0 [<ffffffff8381fb2c>] rtnl_newlink+0x6c/0xa0 [<ffffffff8381d4ee>] rtnetlink_rcv_msg+0x43e/0xe00 [<ffffffff83a2c920>] netlink_rcv_skb+0x170/0x440 [<ffffffff83a2a2cf>] netlink_unicast+0x53f/0x810 [<ffffffff83a2af0b>] netlink_sendmsg+0x96b/0xe90 [<ffffffff83708e0f>] ____sys_sendmsg+0x30f/0xa70 [<ffffffff837129fa>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff83712c6c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff842fe5c8>] do_syscall_64+0x38/0x80 unreferenced object 0xffff888112ffab60 (size 32): comm "ip", pid 351, jiffies 4294763177 (age 19.697s) hex dump (first 32 bytes): a0 bd ef 03 81 88 ff ff a0 bd ef 03 81 88 ff ff ................ 88 a8 0a 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ backtrace: [<ffffffff81a88c0a>] kmalloc_trace+0x2a/0xe0 [<ffffffff840f3599>] vlan_vid_add+0x409/0x790 [<ffffffff840f4a68>] vlan_vids_add_by_dev+0x148/0x390 [<ffffffff82eea5e8>] bond_enslave+0xaf8/0x5d50 [<ffffffff837fbfd1>] do_set_master+0x1c1/0x220 [<ffffffff8380711c>] do_setlink+0xa0c/0x3fa0 [<ffffffff8381ee09>] __rtnl_newlink+0xc09/0x18c0 [<ffffffff8381fb2c>] rtnl_newlink+0x6c/0xa0 [<ffffffff8381d4ee>] rtnetlink_rcv_msg+0x43e/0xe00 [<ffffffff83a2c920>] netlink_rcv_skb+0x170/0x440 [<ffffffff83a2a2cf>] netlink_unicast+0x53f/0x810 [<ffffffff83a2af0b>] netlink_sendmsg+0x96b/0xe90 [<ffffffff83708e0f>] ____sys_sendmsg+0x30f/0xa70 [<ffffffff837129fa>] ___sys_sendmsg+0x13a/0x1e0 [<ffffffff83712c6c>] __sys_sendmsg+0x11c/0x1f0 [<ffffffff842fe5c8>] do_syscall_64+0x38/0x80 [2] #!/bin/bash ip link add name dummy1 type dummy ip link add bond1 type bond mode 802.3ad ip link add link bond1 name bond1.10 type vlan id 10 protocol 802.1ad ip link set dev dummy1 master bond1 ip link del dev dummy1 [3] diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 484c9e3e5e82..447b06ea4fc9 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -5901,7 +5901,9 @@ void bond_setup(struct net_device *bond_dev) bond_dev->hw_features = BOND_VLAN_FEATURES | NETIF_F_HW_VLAN_CTAG_RX | - NETIF_F_HW_VLAN_CTAG_FILTER; + NETIF_F_HW_VLAN_CTAG_FILTER | + NETIF_F_HW_VLAN_STAG_RX | + NETIF_F_HW_VLAN_STAG_FILTER; bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; bond_dev->features |= bond_dev->hw_features; > + } > } > EXPORT_SYMBOL(vlan_vids_del_by_dev); > > -- > 2.25.1 > >
> On Tue, Aug 01, 2023 at 05:59:43PM +0800, Ziyang Xuan wrote: >> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c >> index 0beb44f2fe1f..79cf4f033b66 100644 >> --- a/net/8021q/vlan_core.c >> +++ b/net/8021q/vlan_core.c >> @@ -436,8 +436,11 @@ void vlan_vids_del_by_dev(struct net_device *dev, >> if (!vlan_info) >> return; >> >> - list_for_each_entry(vid_info, &vlan_info->vid_list, list) >> + list_for_each_entry(vid_info, &vlan_info->vid_list, list) { >> + if (!vlan_hw_filter_capable(by_dev, vid_info->proto)) >> + continue; >> vlan_vid_del(dev, vid_info->proto, vid_info->vid); > > vlan_vids_add_by_dev() does not have this check which means that memory > will leak [1] if the device is enslaved after the bond already has a > VLAN upper. > > I believe the correct solution is to explicitly set the STAG-related > features [3] in the bond driver in a similar fashion to how it's done > for the CTAG features. That way the bond driver will always propagate > the VLAN info to its slaves. > Thank you for your detailed analysis and valuable suggestions. I will fix my patch according to your suggestions. > Please check the team driver as well. I think it's suffering from the > same problem. If so, please fix it in a separate patch. > Yes, I will test, and fix it if there is the same bug. Thank you. William Xuan > [1] > unreferenced object 0xffff888103efbd00 (size 256): > comm "ip", pid 351, jiffies 4294763177 (age 19.697s) > hex dump (first 32 bytes): > 00 10 7a 11 81 88 ff ff 00 00 00 00 00 00 00 00 ..z............. > 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > backtrace: > [<ffffffff81a88c0a>] kmalloc_trace+0x2a/0xe0 > [<ffffffff840f349c>] vlan_vid_add+0x30c/0x790 > [<ffffffff840f4a68>] vlan_vids_add_by_dev+0x148/0x390 > [<ffffffff82eea5e8>] bond_enslave+0xaf8/0x5d50 > [<ffffffff837fbfd1>] do_set_master+0x1c1/0x220 > [<ffffffff8380711c>] do_setlink+0xa0c/0x3fa0 > [<ffffffff8381ee09>] __rtnl_newlink+0xc09/0x18c0 > [<ffffffff8381fb2c>] rtnl_newlink+0x6c/0xa0 > [<ffffffff8381d4ee>] rtnetlink_rcv_msg+0x43e/0xe00 > [<ffffffff83a2c920>] netlink_rcv_skb+0x170/0x440 > [<ffffffff83a2a2cf>] netlink_unicast+0x53f/0x810 > [<ffffffff83a2af0b>] netlink_sendmsg+0x96b/0xe90 > [<ffffffff83708e0f>] ____sys_sendmsg+0x30f/0xa70 > [<ffffffff837129fa>] ___sys_sendmsg+0x13a/0x1e0 > [<ffffffff83712c6c>] __sys_sendmsg+0x11c/0x1f0 > [<ffffffff842fe5c8>] do_syscall_64+0x38/0x80 > unreferenced object 0xffff888112ffab60 (size 32): > comm "ip", pid 351, jiffies 4294763177 (age 19.697s) > hex dump (first 32 bytes): > a0 bd ef 03 81 88 ff ff a0 bd ef 03 81 88 ff ff ................ > 88 a8 0a 00 01 00 00 00 cc cc cc cc cc cc cc cc ................ > backtrace: > [<ffffffff81a88c0a>] kmalloc_trace+0x2a/0xe0 > [<ffffffff840f3599>] vlan_vid_add+0x409/0x790 > [<ffffffff840f4a68>] vlan_vids_add_by_dev+0x148/0x390 > [<ffffffff82eea5e8>] bond_enslave+0xaf8/0x5d50 > [<ffffffff837fbfd1>] do_set_master+0x1c1/0x220 > [<ffffffff8380711c>] do_setlink+0xa0c/0x3fa0 > [<ffffffff8381ee09>] __rtnl_newlink+0xc09/0x18c0 > [<ffffffff8381fb2c>] rtnl_newlink+0x6c/0xa0 > [<ffffffff8381d4ee>] rtnetlink_rcv_msg+0x43e/0xe00 > [<ffffffff83a2c920>] netlink_rcv_skb+0x170/0x440 > [<ffffffff83a2a2cf>] netlink_unicast+0x53f/0x810 > [<ffffffff83a2af0b>] netlink_sendmsg+0x96b/0xe90 > [<ffffffff83708e0f>] ____sys_sendmsg+0x30f/0xa70 > [<ffffffff837129fa>] ___sys_sendmsg+0x13a/0x1e0 > [<ffffffff83712c6c>] __sys_sendmsg+0x11c/0x1f0 > [<ffffffff842fe5c8>] do_syscall_64+0x38/0x80 > > [2] > #!/bin/bash > > ip link add name dummy1 type dummy > ip link add bond1 type bond mode 802.3ad > ip link add link bond1 name bond1.10 type vlan id 10 protocol 802.1ad > ip link set dev dummy1 master bond1 > ip link del dev dummy1 > > [3] > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index 484c9e3e5e82..447b06ea4fc9 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -5901,7 +5901,9 @@ void bond_setup(struct net_device *bond_dev) > > bond_dev->hw_features = BOND_VLAN_FEATURES | > NETIF_F_HW_VLAN_CTAG_RX | > - NETIF_F_HW_VLAN_CTAG_FILTER; > + NETIF_F_HW_VLAN_CTAG_FILTER | > + NETIF_F_HW_VLAN_STAG_RX | > + NETIF_F_HW_VLAN_STAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; > bond_dev->features |= bond_dev->hw_features; > >> + } >> } >> EXPORT_SYMBOL(vlan_vids_del_by_dev); >> >> -- >> 2.25.1 >> >> > > . >
diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index 0beb44f2fe1f..79cf4f033b66 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -436,8 +436,11 @@ void vlan_vids_del_by_dev(struct net_device *dev, if (!vlan_info) return; - list_for_each_entry(vid_info, &vlan_info->vid_list, list) + list_for_each_entry(vid_info, &vlan_info->vid_list, list) { + if (!vlan_hw_filter_capable(by_dev, vid_info->proto)) + continue; vlan_vid_del(dev, vid_info->proto, vid_info->vid); + } } EXPORT_SYMBOL(vlan_vids_del_by_dev);
BUG_ON(!vlan_info) is triggered in unregister_vlan_dev() with following testcase: # ip netns add ns1 # ip netns exec ns1 ip link add bond0 type bond mode 0 # ip netns exec ns1 ip link add bond_slave_1 type veth peer veth2 # ip netns exec ns1 ip link set bond_slave_1 master bond0 # ip netns exec ns1 ip link add link bond_slave_1 name vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link add link bond0 name bond0_vlan10 type vlan id 10 protocol 802.1ad # ip netns exec ns1 ip link set bond_slave_1 nomaster # ip netns del ns1 The logical analysis of the problem is as follows: 1. create ETH_P_8021AD protocol vlan10 for bond_slave_1: register_vlan_dev() vlan_vid_add() vlan_info_alloc() // allocate vlan_info for bond_slave_1 __vlan_vid_add() // add [ETH_P_8021AD, 10] vid to bond_slave_1 2. create ETH_P_8021AD protocol bond0_vlan10 for bond0: register_vlan_dev() vlan_vid_add() __vlan_vid_add() vlan_add_rx_filter_info() if (!vlan_hw_filter_capable(dev, proto)) // condition established because bond0 without NETIF_F_HW_VLAN_STAG_FILTER return 0; if (netif_device_present(dev)) return dev->netdev_ops->ndo_vlan_rx_add_vid(dev, proto, vid); // will be never called // The slaves of bond0 will not refer to the [ETH_P_8021AD, 10] vid. 3. detach bond_slave_1 from bond0: __bond_release_one() vlan_vids_del_by_dev() list_for_each_entry(vid_info, &vlan_info->vid_list, list) vlan_vid_del(dev, vid_info->proto, vid_info->vid); // bond_slave_1 [ETH_P_8021AD, 10] vid will be deleted. // bond_slave_1->vlan_info will be assigned NULL. 4. delete vlan10 during delete ns1: default_device_exit_batch() dev->rtnl_link_ops->dellink() // unregister_vlan_dev() for vlan10 vlan_info = rtnl_dereference(real_dev->vlan_info); // real_dev of vlan10 is bond_slave_1 BUG_ON(!vlan_info); // bond_slave_1->vlan_info is NULL now, bug is triggered!!! Add vlan_hw_filter_capable() check for by_dev when delete vids in vlan_vids_del_by_dev() to fix the bug. Fixes: 8ad227ff89a7 ("net: vlan: add 802.1ad support") Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> --- net/8021q/vlan_core.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)