Message ID | 20230911094636.3256542-1-william.xuanziyang@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v4] team: fix null-ptr-deref when team device type is changed | expand |
On Mon, 2023-09-11 at 17:46 +0800, Ziyang Xuan wrote: > Get a null-ptr-deref bug as follows with reproducer [1]. > > BUG: kernel NULL pointer dereference, address: 0000000000000228 > ... > RIP: 0010:vlan_dev_hard_header+0x35/0x140 [8021q] > ... > Call Trace: > <TASK> > ? __die+0x24/0x70 > ? page_fault_oops+0x82/0x150 > ? exc_page_fault+0x69/0x150 > ? asm_exc_page_fault+0x26/0x30 > ? vlan_dev_hard_header+0x35/0x140 [8021q] > ? vlan_dev_hard_header+0x8e/0x140 [8021q] > neigh_connected_output+0xb2/0x100 > ip6_finish_output2+0x1cb/0x520 > ? nf_hook_slow+0x43/0xc0 > ? ip6_mtu+0x46/0x80 > ip6_finish_output+0x2a/0xb0 > mld_sendpack+0x18f/0x250 > mld_ifc_work+0x39/0x160 > process_one_work+0x1e6/0x3f0 > worker_thread+0x4d/0x2f0 > ? __pfx_worker_thread+0x10/0x10 > kthread+0xe5/0x120 > ? __pfx_kthread+0x10/0x10 > ret_from_fork+0x34/0x50 > ? __pfx_kthread+0x10/0x10 > ret_from_fork_asm+0x1b/0x30 > > [1] > $ teamd -t team0 -d -c '{"runner": {"name": "loadbalance"}}' > $ ip link add name t-dummy type dummy > $ ip link add link t-dummy name t-dummy.100 type vlan id 100 > $ ip link add name t-nlmon type nlmon > $ ip link set t-nlmon master team0 > $ ip link set t-nlmon nomaster > $ ip link set t-dummy up > $ ip link set team0 up > $ ip link set t-dummy.100 down > $ ip link set t-dummy.100 master team0 > > When enslave a vlan device to team device and team device type is changed > from non-ether to ether, header_ops of team device is changed to > vlan_header_ops. That is incorrect and will trigger null-ptr-deref > for vlan->real_dev in vlan_dev_hard_header() because team device is not > a vlan device. > > Assign eth_header_ops to header_ops of team device when its type is changed > from non-ether to ether to fix the bug. > > Fixes: 1d76efe1577b ("team: add support for non-ethernet devices") > Suggested-by: Hangbin Liu <liuhangbin@gmail.com> > Reviewed-by: Hangbin Liu <liuhangbin@gmail.com> > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> > --- > v4: > - Add Reviewed-by tag. > v3: > - Export eth_header_ops to fix modpost error. > v2: > - Just modify header_ops to eth_header_ops not use ether_setup(). > --- > drivers/net/team/team.c | 5 ++++- > net/ethernet/eth.c | 1 + > 2 files changed, 5 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > index d3dc22509ea5..12fb5f4cff06 100644 > --- a/drivers/net/team/team.c > +++ b/drivers/net/team/team.c > @@ -2127,7 +2127,10 @@ static const struct ethtool_ops team_ethtool_ops = { > static void team_setup_by_port(struct net_device *dev, > struct net_device *port_dev) > { > - dev->header_ops = port_dev->header_ops; > + if (port_dev->type == ARPHRD_ETHER) > + dev->header_ops = ð_header_ops; > + else > + dev->header_ops = port_dev->header_ops; > dev->type = port_dev->type; > dev->hard_header_len = port_dev->hard_header_len; > dev->needed_headroom = port_dev->needed_headroom; If I read correctly, for !vlan_hw_offload_capable() lower dev, egress packets going trough the team device will not contain the vlan tag. Additionally, why is vlan special? Why others lower devices with custom header_ops do not need any care? Exporting 'eth_header_ops' for team's sake only looks a bit too much to me. I think could instead cache the header_ops ptr after the initial ether_setup(). Thanks! Paolo
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c >> index d3dc22509ea5..12fb5f4cff06 100644 >> --- a/drivers/net/team/team.c >> +++ b/drivers/net/team/team.c >> @@ -2127,7 +2127,10 @@ static const struct ethtool_ops team_ethtool_ops = { >> static void team_setup_by_port(struct net_device *dev, >> struct net_device *port_dev) >> { >> - dev->header_ops = port_dev->header_ops; >> + if (port_dev->type == ARPHRD_ETHER) >> + dev->header_ops = ð_header_ops; >> + else >> + dev->header_ops = port_dev->header_ops; >> dev->type = port_dev->type; >> dev->hard_header_len = port_dev->hard_header_len; >> dev->needed_headroom = port_dev->needed_headroom; > > If I read correctly, for !vlan_hw_offload_capable() lower dev, egress > packets going trough the team device will not contain the vlan tag. > > Additionally, why is vlan special? Why others lower devices with custom > header_ops do not need any care? We have also got ipvlan device problem as following: BUG: KASAN: slab-out-of-bounds in ipvlan_hard_header+0xd1/0xe0 Read of size 8 at addr ffff888018ee1de8 by task syz-executor.1/3469 ... Call Trace: <IRQ> dump_stack+0xbe/0xfd print_address_description.constprop.0+0x19/0x170 ? ipvlan_hard_header+0xd1/0xe0 __kasan_report.cold+0x6c/0x84 ? ipvlan_hard_header+0xd1/0xe0 kasan_report+0x3a/0x50 ipvlan_hard_header+0xd1/0xe0 ? ipvlan_get_iflink+0x40/0x40 neigh_resolve_output+0x28f/0x410 ip6_finish_output2+0x762/0xef0 ? ip6_frag_init+0xf0/0xf0 ? nf_nat_icmpv6_reply_translation+0x460/0x460 ? do_add_counters+0x370/0x370 ? do_add_counters+0x370/0x370 ? ipv6_confirm+0x1ee/0x360 ? nf_ct_bridge_unregister+0x50/0x50 __ip6_finish_output.part.0+0x1a8/0x400 ip6_finish_output+0x1a9/0x1e0 ip6_output+0x146/0x2b0 ? ip6_finish_output+0x1e0/0x1e0 ? __ip6_finish_output+0xb0/0xb0 ? __sanitizer_cov_trace_switch+0x50/0x90 ? nf_hook_slow+0xa3/0x150 mld_sendpack+0x3d9/0x670 ? mca_alloc+0x210/0x210 ? add_grhead+0xf5/0x140 ? ipv6_icmp_sysctl_init+0xd0/0xd0 ? add_grec+0x4e1/0xa90 ? _raw_spin_lock_bh+0x85/0xe0 ? _raw_read_unlock_irqrestore+0x30/0x30 mld_send_cr+0x426/0x630 ? migrate_swap_stop+0x400/0x400 mld_ifc_timer_expire+0x22/0x240 ? ipv6_mc_netdev_event+0x80/0x80 call_timer_fn+0x3d/0x230 ? ipv6_mc_netdev_event+0x80/0x80 expire_timers+0x190/0x270 run_timer_softirq+0x22c/0x560 ipvlan problem is slightly different from vlan. // add ipvlan to team device team_port_add team_dev_type_check_change team_setup_by_port // assign ipvlan_header_ops to team_dev->header_ops netdev_rx_handler_register // fail out without restore team_dev->header_ops // add other ether type device to team device team_port_add team_dev_type_check_change // return directly because port_dev->type and team_dev->type are same team_dev->head_ops has be assigned to ipvlan_header_ops. That will trigger excption. > > Exporting 'eth_header_ops' for team's sake only looks a bit too much to > me. I think could instead cache the header_ops ptr after the initial > ether_setup(). Is it possible to use ether_setup() like bonding driver andmodify MTU individually later? > > Thanks! > > Paolo > > > > > . >
On Wed, 2023-09-13 at 14:15 +0800, Ziyang Xuan (William) wrote: > > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > > > index d3dc22509ea5..12fb5f4cff06 100644 > > > --- a/drivers/net/team/team.c > > > +++ b/drivers/net/team/team.c > > > @@ -2127,7 +2127,10 @@ static const struct ethtool_ops > > > team_ethtool_ops = { > > > static void team_setup_by_port(struct net_device *dev, > > > struct net_device *port_dev) > > > { > > > - dev->header_ops = port_dev->header_ops; > > > + if (port_dev->type == ARPHRD_ETHER) > > > + dev->header_ops = ð_header_ops; > > > + else > > > + dev->header_ops = port_dev->header_ops; > > > dev->type = port_dev->type; > > > dev->hard_header_len = port_dev->hard_header_len; > > > dev->needed_headroom = port_dev->needed_headroom; > > > > If I read correctly, for !vlan_hw_offload_capable() lower dev, > > egress > > packets going trough the team device will not contain the vlan tag. > > > > Additionally, why is vlan special? Why others lower devices with > > custom > > header_ops do not need any care? > > We have also got ipvlan device problem as following: > > BUG: KASAN: slab-out-of-bounds in ipvlan_hard_header+0xd1/0xe0 > Read of size 8 at addr ffff888018ee1de8 by task syz-executor.1/3469 > ... > Call Trace: > <IRQ> > dump_stack+0xbe/0xfd > print_address_description.constprop.0+0x19/0x170 > ? ipvlan_hard_header+0xd1/0xe0 > __kasan_report.cold+0x6c/0x84 > ? ipvlan_hard_header+0xd1/0xe0 > kasan_report+0x3a/0x50 > ipvlan_hard_header+0xd1/0xe0 > ? ipvlan_get_iflink+0x40/0x40 > neigh_resolve_output+0x28f/0x410 > ip6_finish_output2+0x762/0xef0 > ? ip6_frag_init+0xf0/0xf0 > ? nf_nat_icmpv6_reply_translation+0x460/0x460 > ? do_add_counters+0x370/0x370 > ? do_add_counters+0x370/0x370 > ? ipv6_confirm+0x1ee/0x360 > ? nf_ct_bridge_unregister+0x50/0x50 > __ip6_finish_output.part.0+0x1a8/0x400 > ip6_finish_output+0x1a9/0x1e0 > ip6_output+0x146/0x2b0 > ? ip6_finish_output+0x1e0/0x1e0 > ? __ip6_finish_output+0xb0/0xb0 > ? __sanitizer_cov_trace_switch+0x50/0x90 > ? nf_hook_slow+0xa3/0x150 > mld_sendpack+0x3d9/0x670 > ? mca_alloc+0x210/0x210 > ? add_grhead+0xf5/0x140 > ? ipv6_icmp_sysctl_init+0xd0/0xd0 > ? add_grec+0x4e1/0xa90 > ? _raw_spin_lock_bh+0x85/0xe0 > ? _raw_read_unlock_irqrestore+0x30/0x30 > mld_send_cr+0x426/0x630 > ? migrate_swap_stop+0x400/0x400 > mld_ifc_timer_expire+0x22/0x240 > ? ipv6_mc_netdev_event+0x80/0x80 > call_timer_fn+0x3d/0x230 > ? ipv6_mc_netdev_event+0x80/0x80 > expire_timers+0x190/0x270 > run_timer_softirq+0x22c/0x560 > > ipvlan problem is slightly different from vlan. > > // add ipvlan to team device > team_port_add > team_dev_type_check_change > team_setup_by_port // assign ipvlan_header_ops to team_dev- > >header_ops > netdev_rx_handler_register // fail out without restore team_dev- > >header_ops > > // add other ether type device to team device > team_port_add > team_dev_type_check_change // return directly because port_dev- > >type and team_dev->type are same > > team_dev->head_ops has be assigned to ipvlan_header_ops. That will > trigger excption. To me both cases look the same in the end: the team driver sets and use header_ops of a different device that will assume dev_priv() being a different struct. I'm guessing a generic solution could be implementing 'trampoline' header_ops that just call into the lower port corresponding op, and assigning such ops to the team device every time the lower has non ethernet header_ops. team_dev_type_check_change() should then probably check both dev->type and dev->header_ops. > > Exporting 'eth_header_ops' for team's sake only looks a bit too > > much to > > me. I think could instead cache the header_ops ptr after the > > initial > > ether_setup(). > > Is it possible to use ether_setup() like bonding driver andmodify MTU > individually later? That could be another option to get the eth_header_ops. Note that in the end both are quite similar, you will have to cache some info (the mtu with the latter); ether_setup() possibly will have more side effects, as it touches many fields. I personally would use the thing I suggested above. Cheers, Paolo
Wed, Sep 13, 2023 at 08:28:13AM CEST, pabeni@redhat.com wrote: >On Wed, 2023-09-13 at 14:15 +0800, Ziyang Xuan (William) wrote: >> > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c >> > > index d3dc22509ea5..12fb5f4cff06 100644 >> > > --- a/drivers/net/team/team.c >> > > +++ b/drivers/net/team/team.c >> > > @@ -2127,7 +2127,10 @@ static const struct ethtool_ops >> > > team_ethtool_ops = { >> > > static void team_setup_by_port(struct net_device *dev, >> > > struct net_device *port_dev) >> > > { >> > > - dev->header_ops = port_dev->header_ops; >> > > + if (port_dev->type == ARPHRD_ETHER) >> > > + dev->header_ops = ð_header_ops; >> > > + else >> > > + dev->header_ops = port_dev->header_ops; >> > > dev->type = port_dev->type; >> > > dev->hard_header_len = port_dev->hard_header_len; >> > > dev->needed_headroom = port_dev->needed_headroom; >> > >> > If I read correctly, for !vlan_hw_offload_capable() lower dev, >> > egress >> > packets going trough the team device will not contain the vlan tag. >> > >> > Additionally, why is vlan special? Why others lower devices with >> > custom >> > header_ops do not need any care? >> >> We have also got ipvlan device problem as following: >> >> BUG: KASAN: slab-out-of-bounds in ipvlan_hard_header+0xd1/0xe0 >> Read of size 8 at addr ffff888018ee1de8 by task syz-executor.1/3469 >> ... >> Call Trace: >> <IRQ> >> dump_stack+0xbe/0xfd >> print_address_description.constprop.0+0x19/0x170 >> ? ipvlan_hard_header+0xd1/0xe0 >> __kasan_report.cold+0x6c/0x84 >> ? ipvlan_hard_header+0xd1/0xe0 >> kasan_report+0x3a/0x50 >> ipvlan_hard_header+0xd1/0xe0 >> ? ipvlan_get_iflink+0x40/0x40 >> neigh_resolve_output+0x28f/0x410 >> ip6_finish_output2+0x762/0xef0 >> ? ip6_frag_init+0xf0/0xf0 >> ? nf_nat_icmpv6_reply_translation+0x460/0x460 >> ? do_add_counters+0x370/0x370 >> ? do_add_counters+0x370/0x370 >> ? ipv6_confirm+0x1ee/0x360 >> ? nf_ct_bridge_unregister+0x50/0x50 >> __ip6_finish_output.part.0+0x1a8/0x400 >> ip6_finish_output+0x1a9/0x1e0 >> ip6_output+0x146/0x2b0 >> ? ip6_finish_output+0x1e0/0x1e0 >> ? __ip6_finish_output+0xb0/0xb0 >> ? __sanitizer_cov_trace_switch+0x50/0x90 >> ? nf_hook_slow+0xa3/0x150 >> mld_sendpack+0x3d9/0x670 >> ? mca_alloc+0x210/0x210 >> ? add_grhead+0xf5/0x140 >> ? ipv6_icmp_sysctl_init+0xd0/0xd0 >> ? add_grec+0x4e1/0xa90 >> ? _raw_spin_lock_bh+0x85/0xe0 >> ? _raw_read_unlock_irqrestore+0x30/0x30 >> mld_send_cr+0x426/0x630 >> ? migrate_swap_stop+0x400/0x400 >> mld_ifc_timer_expire+0x22/0x240 >> ? ipv6_mc_netdev_event+0x80/0x80 >> call_timer_fn+0x3d/0x230 >> ? ipv6_mc_netdev_event+0x80/0x80 >> expire_timers+0x190/0x270 >> run_timer_softirq+0x22c/0x560 >> >> ipvlan problem is slightly different from vlan. >> >> // add ipvlan to team device >> team_port_add >> team_dev_type_check_change >> team_setup_by_port // assign ipvlan_header_ops to team_dev- >> >header_ops >> netdev_rx_handler_register // fail out without restore team_dev- >> >header_ops >> >> // add other ether type device to team device >> team_port_add >> team_dev_type_check_change // return directly because port_dev- >> >type and team_dev->type are same >> >> team_dev->head_ops has be assigned to ipvlan_header_ops. That will >> trigger excption. > >To me both cases look the same in the end: the team driver sets and use >header_ops of a different device that will assume dev_priv() being a >different struct. > >I'm guessing a generic solution could be implementing 'trampoline' >header_ops that just call into the lower port corresponding op, and >assigning such ops to the team device every time the lower has non >ethernet header_ops. > >team_dev_type_check_change() should then probably check both dev->type >and dev->header_ops. > >> > Exporting 'eth_header_ops' for team's sake only looks a bit too >> > much to >> > me. I think could instead cache the header_ops ptr after the >> > initial >> > ether_setup(). >> >> Is it possible to use ether_setup() like bonding driver andmodify MTU >> individually later? > >That could be another option to get the eth_header_ops. > >Note that in the end both are quite similar, you will have to cache >some info (the mtu with the latter); ether_setup() possibly will have >more side effects, as it touches many fields. I personally would use >the thing I suggested above. Agreed. That is why ether_setup() was not used in the first place. > >Cheers, > >Paolo >
> On Wed, 2023-09-13 at 14:15 +0800, Ziyang Xuan (William) wrote: > > To me both cases look the same in the end: the team driver sets and use > header_ops of a different device that will assume dev_priv() being a > different struct. > > I'm guessing a generic solution could be implementing 'trampoline' > header_ops that just call into the lower port corresponding op, and > assigning such ops to the team device every time the lower has non > ethernet header_ops. > > team_dev_type_check_change() should then probably check both dev->type > and dev->header_ops. > >>> Exporting 'eth_header_ops' for team's sake only looks a bit too >>> much to >>> me. I think could instead cache the header_ops ptr after the >>> initial >>> ether_setup(). >> >> Is it possible to use ether_setup() like bonding driver andmodify MTU >> individually later? > > That could be another option to get the eth_header_ops. > > Note that in the end both are quite similar, you will have to cache > some info (the mtu with the latter); ether_setup() possibly will have > more side effects, as it touches many fields. I personally would use > the thing I suggested above. > Hi Pallo, Is it possible to modify like this? diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index d3dc22509ea5..8e6a87ba85aa 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -2127,7 +2127,12 @@ static const struct ethtool_ops team_ethtool_ops = { static void team_setup_by_port(struct net_device *dev, struct net_device *port_dev) { - dev->header_ops = port_dev->header_ops; + struct team *team = netdev_priv(dev); + + if (port_dev->type == ARPHRD_ETHER) + dev->header_ops = team->header_ops_cache; + else + dev->header_ops = port_dev->header_ops; dev->type = port_dev->type; dev->hard_header_len = port_dev->hard_header_len; dev->needed_headroom = port_dev->needed_headroom; @@ -2174,8 +2179,11 @@ static int team_dev_type_check_change(struct net_device *dev, static void team_setup(struct net_device *dev) { + struct team *team = netdev_priv(dev); + ether_setup(dev); dev->max_mtu = ETH_MAX_MTU; + team->header_ops_cache = dev->header_ops; dev->netdev_ops = &team_netdev_ops; dev->ethtool_ops = &team_ethtool_ops; diff --git a/include/linux/if_team.h b/include/linux/if_team.h index 8de6b6e67829..34bcba5a7067 100644 --- a/include/linux/if_team.h +++ b/include/linux/if_team.h @@ -189,6 +189,8 @@ struct team { struct net_device *dev; /* associated netdevice */ struct team_pcpu_stats __percpu *pcpu_stats; + const struct header_ops *header_ops_cache; + struct mutex lock; /* used for overall locking, e.g. port lists write */ Thank you. > Cheers, > > Paolo > > . >
diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index d3dc22509ea5..12fb5f4cff06 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -2127,7 +2127,10 @@ static const struct ethtool_ops team_ethtool_ops = { static void team_setup_by_port(struct net_device *dev, struct net_device *port_dev) { - dev->header_ops = port_dev->header_ops; + if (port_dev->type == ARPHRD_ETHER) + dev->header_ops = ð_header_ops; + else + dev->header_ops = port_dev->header_ops; dev->type = port_dev->type; dev->hard_header_len = port_dev->hard_header_len; dev->needed_headroom = port_dev->needed_headroom; diff --git a/net/ethernet/eth.c b/net/ethernet/eth.c index 2edc8b796a4e..157833509adb 100644 --- a/net/ethernet/eth.c +++ b/net/ethernet/eth.c @@ -347,6 +347,7 @@ const struct header_ops eth_header_ops ____cacheline_aligned = { .cache_update = eth_header_cache_update, .parse_protocol = eth_header_parse_protocol, }; +EXPORT_SYMBOL(eth_header_ops); /** * ether_setup - setup Ethernet network device