Message ID | 20201123031716.6179-1-jarod@redhat.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] bonding: fix feature flag setting at init time | expand |
Context | Check | Description |
---|---|---|
netdev/apply | fail | Patch does not apply to net |
netdev/tree_selection | success | Clearly marked for net |
On Sun, 22 Nov 2020 22:17:16 -0500 Jarod Wilson <jarod@redhat.com> wrote: > Have run into a case where bond_option_mode_set() gets called before > hw_features has been filled in, and very bad things happen when > netdev_change_features() then gets called, because the empty hw_features > wipes out almost all features. Further reading of netdev feature flag > documentation suggests drivers aren't supposed to touch wanted_features, > so this changes bond_option_mode_set() to use netdev_increment_features() > and &= ~BOND_XFRM_FEATURES on mode changes and then only calling > netdev_features_change() if there was actually a change of features. This > specifically fixes bonding on top of mlxsw interfaces, and has been > regression-tested with ixgbe interfaces. This change also simplifies the > xfrm-specific code in bond_setup() a little bit as well. Hi Jarod, the reason is not correct... The problem is not with empty ->hw_features but with empty ->wanted_features. During bond device creation bond_newlink() is called. It calls bond_changelink() first and afterwards register_netdevice(). The problem is that ->wanted_features are initialized in register_netdevice() so during bond_changlink() call ->wanted_features is 0. So... bond_newlink() -> bond_changelink() -> __bond_opt_set() -> bond_option_mode_set() -> netdev_change_features() -> __netdev_update_features() features = netdev_get_wanted_features() { dev->features & ~dev->hw_features) | dev->wanted_features } dev->wanted_features is here zero so the rest of the expression clears a bunch of bits from dev->features... In case of mlxsw it is important that NETIF_F_HW_VLAN_CTAG_FILTER bit is cleared in bonding device because in this case vlan_add_rx_filter_info() does not call bond_vlan_rx_add_vid() so mlxsw_sp_port_add_vid() is not called as well. Later this causes a WARN in mlxsw_sp_inetaddr_port_vlan_event() because instance of mlxsw_sp_port_vlan does not exist as mlxsw_sp_port_add_vid() was not called. Btw. it should be enough to call existing snippet in bond_option_mode_set() only when device is already registered? E.g.: diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..ca4913fee5a9 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -768,11 +768,13 @@ static int bond_option_mode_set(struct bonding *bond, bond->params.tlb_dynamic_lb = 1; #ifdef CONFIG_XFRM_OFFLOAD - if (newval->value == BOND_MODE_ACTIVEBACKUP) - bond->dev->wanted_features |= BOND_XFRM_FEATURES; - else - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); + if (dev->reg_state == NETREG_REGISTERED) { + if (newval->value == BOND_MODE_ACTIVEBACKUP) + bond->dev->wanted_features |= BOND_XFRM_FEATURES; + else + bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; + netdev_change_features(bond->dev); + } #endif /* CONFIG_XFRM_OFFLOAD */ Thanks, Ivan
Hi Jarod, Thank you for the patch! Yet something to improve: [auto build test ERROR on net-next/master] [also build test ERROR on next-20201123] [cannot apply to net/master linux/master linus/master sparc-next/master v5.10-rc5] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/0day-ci/linux/commits/Jarod-Wilson/bonding-fix-feature-flag-setting-at-init-time/20201123-111956 base: https://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next.git f9e425e99b0756c1479042afe761073779df2a30 config: x86_64-rhel (attached as .config) compiler: gcc-9 (Debian 9.3.0-15) 9.3.0 reproduce (this is a W=1 build): # https://github.com/0day-ci/linux/commit/6d883c4c2b01573ba9dddcb9fe109f961a8b7f10 git remote add linux-review https://github.com/0day-ci/linux git fetch --no-tags linux-review Jarod-Wilson/bonding-fix-feature-flag-setting-at-init-time/20201123-111956 git checkout 6d883c4c2b01573ba9dddcb9fe109f961a8b7f10 # save the attached .config to linux build tree make W=1 ARCH=x86_64 If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): drivers/net/bonding/bond_options.c: In function 'bond_option_mode_set': >> drivers/net/bonding/bond_options.c:752:38: error: 'BOND_XFRM_FEATURES' undeclared (first use in this function) 752 | netdev_features_t mask = features & BOND_XFRM_FEATURES; | ^~~~~~~~~~~~~~~~~~ drivers/net/bonding/bond_options.c:752:38: note: each undeclared identifier is reported only once for each function it appears in drivers/net/bonding/bond_options.c:752:20: warning: unused variable 'mask' [-Wunused-variable] 752 | netdev_features_t mask = features & BOND_XFRM_FEATURES; | ^~~~ vim +/BOND_XFRM_FEATURES +752 drivers/net/bonding/bond_options.c 747 748 static int bond_option_mode_set(struct bonding *bond, 749 const struct bond_opt_value *newval) 750 { 751 netdev_features_t features = bond->dev->features; > 752 netdev_features_t mask = features & BOND_XFRM_FEATURES; 753 754 if (!bond_mode_uses_arp(newval->value)) { 755 if (bond->params.arp_interval) { 756 netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n", 757 newval->string); 758 /* disable arp monitoring */ 759 bond->params.arp_interval = 0; 760 } 761 762 if (!bond->params.miimon) { 763 /* set miimon to default value */ 764 bond->params.miimon = BOND_DEFAULT_MIIMON; 765 netdev_dbg(bond->dev, "Setting MII monitoring interval to %d\n", 766 bond->params.miimon); 767 } 768 } 769 770 if (newval->value == BOND_MODE_ALB) 771 bond->params.tlb_dynamic_lb = 1; 772 --- 0-DAY CI Kernel Test Service, Intel Corporation https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index 71c9677d135f..b8e0cb4f9480 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4721,15 +4721,13 @@ void bond_setup(struct net_device *bond_dev) NETIF_F_HW_VLAN_CTAG_FILTER; bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; -#ifdef CONFIG_XFRM_OFFLOAD - bond_dev->hw_features |= BOND_XFRM_FEATURES; -#endif /* CONFIG_XFRM_OFFLOAD */ bond_dev->features |= bond_dev->hw_features; bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; #ifdef CONFIG_XFRM_OFFLOAD - /* Disable XFRM features if this isn't an active-backup config */ - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) - bond_dev->features &= ~BOND_XFRM_FEATURES; + bond_dev->hw_features |= BOND_XFRM_FEATURES; + /* Only enable XFRM features if this is an active-backup config */ + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) + bond_dev->features |= BOND_XFRM_FEATURES; #endif /* CONFIG_XFRM_OFFLOAD */ } diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..bce34648d97d 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -748,6 +748,9 @@ const struct bond_option *bond_opt_get(unsigned int option) static int bond_option_mode_set(struct bonding *bond, const struct bond_opt_value *newval) { + netdev_features_t features = bond->dev->features; + netdev_features_t mask = features & BOND_XFRM_FEATURES; + if (!bond_mode_uses_arp(newval->value)) { if (bond->params.arp_interval) { netdev_dbg(bond->dev, "%s mode is incompatible with arp monitoring, start mii monitoring\n", @@ -769,10 +772,15 @@ static int bond_option_mode_set(struct bonding *bond, #ifdef CONFIG_XFRM_OFFLOAD if (newval->value == BOND_MODE_ACTIVEBACKUP) - bond->dev->wanted_features |= BOND_XFRM_FEATURES; + features = netdev_increment_features(features, + BOND_XFRM_FEATURES, mask); else - bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); + features &= ~BOND_XFRM_FEATURES; + + if (bond->dev->features != features) { + bond->dev->features = features; + netdev_features_change(bond->dev); + } #endif /* CONFIG_XFRM_OFFLOAD */ /* don't cache arp_validate between modes */
Have run into a case where bond_option_mode_set() gets called before hw_features has been filled in, and very bad things happen when netdev_change_features() then gets called, because the empty hw_features wipes out almost all features. Further reading of netdev feature flag documentation suggests drivers aren't supposed to touch wanted_features, so this changes bond_option_mode_set() to use netdev_increment_features() and &= ~BOND_XFRM_FEATURES on mode changes and then only calling netdev_features_change() if there was actually a change of features. This specifically fixes bonding on top of mlxsw interfaces, and has been regression-tested with ixgbe interfaces. This change also simplifies the xfrm-specific code in bond_setup() a little bit as well. Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera <ivecera@redhat.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_main.c | 10 ++++------ drivers/net/bonding/bond_options.c | 14 +++++++++++--- 2 files changed, 15 insertions(+), 9 deletions(-)