Message ID | 20201202173053.13800-1-jarod@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v2] bonding: fix feature flag setting at init time | expand |
Context | Check | Description |
---|---|---|
netdev/apply | fail | Patch does not apply to net |
netdev/tree_selection | success | Clearly marked for net |
On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson <jarod@redhat.com> wrote: > Don't try to adjust XFRM support flags if the bond device isn't yet > registered. Bad things can currently happen when netdev_change_features() > is called without having wanted_features fully filled in yet. Basically, > this code was racing against register_netdevice() filling in > wanted_features, and when it got there first, the empty wanted_features > led to features also getting emptied out, which was definitely not the > intended behavior, so prevent that from happening. > > Originally, I'd hoped to stop adjusting wanted_features at all in the > bonding driver, as it's documented as being something only the network > core should touch, but we actually do need to do this to properly update > both the features and wanted_features fields when changing the bond type, > or we get to a situation where ethtool sees: > > esp-hw-offload: off [requested on] > > I do think we should be using netdev_update_features instead of > netdev_change_features here though, so we only send notifiers when the > features actually changed. > > v2: rework based on further testing and suggestions from ivecera > > Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") > Reported-by: Ivan Vecera <ivecera@redhat.com> > Suggested-by: Ivan Vecera <ivecera@redhat.com> > Cc: Jay Vosburgh <j.vosburgh@gmail.com> > Cc: Veaceslav Falico <vfalico@gmail.com> > Cc: Andy Gospodarek <andy@greyhouse.net> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Thomas Davis <tadavis@lbl.gov> > Cc: netdev@vger.kernel.org > Signed-off-by: Jarod Wilson <jarod@redhat.com> > --- > drivers/net/bonding/bond_main.c | 10 ++++------ > drivers/net/bonding/bond_options.c | 6 +++++- > 2 files changed, 9 insertions(+), 7 deletions(-) > > diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c > index e0880a3840d7..5fe5232cc3f3 100644 > --- a/drivers/net/bonding/bond_main.c > +++ b/drivers/net/bonding/bond_main.c > @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) > NETIF_F_HW_VLAN_CTAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; > -#ifdef CONFIG_XFRM_OFFLOAD > - bond_dev->hw_features |= BOND_XFRM_FEATURES; > -#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD > - /* Disable XFRM features if this isn't an active-backup config */ > - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) > - bond_dev->features &= ~BOND_XFRM_FEATURES; > + bond_dev->hw_features |= BOND_XFRM_FEATURES; > + /* Only enable XFRM features if this is an active-backup config */ > + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) > + bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ > } > > diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c > index 9abfaae1c6f7..19205cfac751 100644 > --- a/drivers/net/bonding/bond_options.c > +++ b/drivers/net/bonding/bond_options.c > @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, > bond->params.tlb_dynamic_lb = 1; > > #ifdef CONFIG_XFRM_OFFLOAD > + if (bond->dev->reg_state != NETREG_REGISTERED) > + goto noreg; > + > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > - netdev_change_features(bond->dev); > + netdev_update_features(bond->dev); > +noreg: > #endif /* CONFIG_XFRM_OFFLOAD */ > > /* don't cache arp_validate between modes */ Tested-by: Ivan Vecera <ivecera@redhat.com>
On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > + if (bond->dev->reg_state != NETREG_REGISTERED) > + goto noreg; > + > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > - netdev_change_features(bond->dev); > + netdev_update_features(bond->dev); > +noreg: Why the goto?
Jarod Wilson <jarod@redhat.com> wrote: >Don't try to adjust XFRM support flags if the bond device isn't yet >registered. Bad things can currently happen when netdev_change_features() >is called without having wanted_features fully filled in yet. Basically, >this code was racing against register_netdevice() filling in >wanted_features, and when it got there first, the empty wanted_features >led to features also getting emptied out, which was definitely not the >intended behavior, so prevent that from happening. Is this an actual race? Reading Ivan's prior message, it sounds like it's an ordering problem (in that bond_newlink calls register_netdevice after bond_changelink). The change to bond_option_mode_set tests against reg_state, so presumably it wants to skip the first(?) time through, before the register_netdevice call; is that right? -J >Originally, I'd hoped to stop adjusting wanted_features at all in the >bonding driver, as it's documented as being something only the network >core should touch, but we actually do need to do this to properly update >both the features and wanted_features fields when changing the bond type, >or we get to a situation where ethtool sees: > > esp-hw-offload: off [requested on] > >I do think we should be using netdev_update_features instead of >netdev_change_features here though, so we only send notifiers when the >features actually changed. > >v2: rework based on further testing and suggestions from ivecera > >Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") >Reported-by: Ivan Vecera <ivecera@redhat.com> >Suggested-by: Ivan Vecera <ivecera@redhat.com> >Cc: Jay Vosburgh <j.vosburgh@gmail.com> >Cc: Veaceslav Falico <vfalico@gmail.com> >Cc: Andy Gospodarek <andy@greyhouse.net> >Cc: "David S. Miller" <davem@davemloft.net> >Cc: Jakub Kicinski <kuba@kernel.org> >Cc: Thomas Davis <tadavis@lbl.gov> >Cc: netdev@vger.kernel.org >Signed-off-by: Jarod Wilson <jarod@redhat.com> >--- > drivers/net/bonding/bond_main.c | 10 ++++------ > drivers/net/bonding/bond_options.c | 6 +++++- > 2 files changed, 9 insertions(+), 7 deletions(-) > >diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c >index e0880a3840d7..5fe5232cc3f3 100644 >--- a/drivers/net/bonding/bond_main.c >+++ b/drivers/net/bonding/bond_main.c >@@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) > NETIF_F_HW_VLAN_CTAG_FILTER; > > bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; >-#ifdef CONFIG_XFRM_OFFLOAD >- bond_dev->hw_features |= BOND_XFRM_FEATURES; >-#endif /* CONFIG_XFRM_OFFLOAD */ > bond_dev->features |= bond_dev->hw_features; > bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; > #ifdef CONFIG_XFRM_OFFLOAD >- /* Disable XFRM features if this isn't an active-backup config */ >- if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) >- bond_dev->features &= ~BOND_XFRM_FEATURES; >+ bond_dev->hw_features |= BOND_XFRM_FEATURES; >+ /* Only enable XFRM features if this is an active-backup config */ >+ if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) >+ bond_dev->features |= BOND_XFRM_FEATURES; > #endif /* CONFIG_XFRM_OFFLOAD */ > } > >diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c >index 9abfaae1c6f7..19205cfac751 100644 >--- a/drivers/net/bonding/bond_options.c >+++ b/drivers/net/bonding/bond_options.c >@@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, > bond->params.tlb_dynamic_lb = 1; > > #ifdef CONFIG_XFRM_OFFLOAD >+ if (bond->dev->reg_state != NETREG_REGISTERED) >+ goto noreg; >+ > if (newval->value == BOND_MODE_ACTIVEBACKUP) > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > else > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; >- netdev_change_features(bond->dev); >+ netdev_update_features(bond->dev); >+noreg: > > #endif /* CONFIG_XFRM_OFFLOAD */ > > /* don't cache arp_validate between modes */ >-- >2.28.0 > --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > + goto noreg; > > + > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > else > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > - netdev_change_features(bond->dev); > > + netdev_update_features(bond->dev); > > +noreg: > > Why the goto? Seemed cleaner to prevent an extra level of indentation of the code following the goto and before the label, but I'm not that attached to it if it's not wanted for coding style reasons.
On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote: > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > > + goto noreg; > > > + > > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > > else > > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > > - netdev_change_features(bond->dev); > > > + netdev_update_features(bond->dev); > > > +noreg: > > > > Why the goto? > > Seemed cleaner to prevent an extra level of indentation of the code > following the goto and before the label, but I'm not that attached to > it if it's not wanted for coding style reasons. Yes, please don't use gotos where a normal if statement is sufficient. If you must avoid the indentation move the code to a helper. Also - this patch did not apply to net, please make sure you're developing on the correct base.
On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote: > > Jarod Wilson <jarod@redhat.com> wrote: > > >Don't try to adjust XFRM support flags if the bond device isn't yet > >registered. Bad things can currently happen when netdev_change_features() > >is called without having wanted_features fully filled in yet. Basically, > >this code was racing against register_netdevice() filling in > >wanted_features, and when it got there first, the empty wanted_features > >led to features also getting emptied out, which was definitely not the > >intended behavior, so prevent that from happening. > > Is this an actual race? Reading Ivan's prior message, it sounds > like it's an ordering problem (in that bond_newlink calls > register_netdevice after bond_changelink). Sorry, yeah, this is not actually a race condition, just an ordering issue, bond_check_params() gets called at init time, which leads to bond_option_mode_set() being called, and does so prior to bond_create() running, which is where we actually call register_netdevice(). > The change to bond_option_mode_set tests against reg_state, so > presumably it wants to skip the first(?) time through, before the > register_netdevice call; is that right? Correct. Later on, when the bonding driver is already loaded, and parameter changes are made, bond_option_mode_set() gets called and if the mode changes to or from active-backup, we do need/want this code to run to update wanted and features flags properly.
On Wed, Dec 2, 2020 at 2:23 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 2 Dec 2020 14:03:53 -0500 Jarod Wilson wrote: > > On Wed, Dec 2, 2020 at 12:53 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Wed, 2 Dec 2020 12:30:53 -0500 Jarod Wilson wrote: > > > > + if (bond->dev->reg_state != NETREG_REGISTERED) > > > > + goto noreg; > > > > + > > > > if (newval->value == BOND_MODE_ACTIVEBACKUP) > > > > bond->dev->wanted_features |= BOND_XFRM_FEATURES; > > > > else > > > > bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; > > > > - netdev_change_features(bond->dev); > > > > + netdev_update_features(bond->dev); > > > > +noreg: > > > > > > Why the goto? > > > > Seemed cleaner to prevent an extra level of indentation of the code > > following the goto and before the label, but I'm not that attached to > > it if it's not wanted for coding style reasons. > > Yes, please don't use gotos where a normal if statement is sufficient. > If you must avoid the indentation move the code to a helper. > > Also - this patch did not apply to net, please make sure you're > developing on the correct base. Argh, I must have been working in net-next instead of net, apologies. Okay, I'll clarify the description per what Jay pointed out and adjust the code to not include a goto, then make it on the right branch.
Jarod Wilson <jarod@redhat.com> wrote: >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote: >> >> Jarod Wilson <jarod@redhat.com> wrote: >> >> >Don't try to adjust XFRM support flags if the bond device isn't yet >> >registered. Bad things can currently happen when netdev_change_features() >> >is called without having wanted_features fully filled in yet. Basically, >> >this code was racing against register_netdevice() filling in >> >wanted_features, and when it got there first, the empty wanted_features >> >led to features also getting emptied out, which was definitely not the >> >intended behavior, so prevent that from happening. >> >> Is this an actual race? Reading Ivan's prior message, it sounds >> like it's an ordering problem (in that bond_newlink calls >> register_netdevice after bond_changelink). > >Sorry, yeah, this is not actually a race condition, just an ordering >issue, bond_check_params() gets called at init time, which leads to >bond_option_mode_set() being called, and does so prior to >bond_create() running, which is where we actually call >register_netdevice(). So this only happens if there's a "mode" module parameter? That doesn't sound like the call path that Ivan described (coming in via bond_newlink). -J >> The change to bond_option_mode_set tests against reg_state, so >> presumably it wants to skip the first(?) time through, before the >> register_netdevice call; is that right? > >Correct. Later on, when the bonding driver is already loaded, and >parameter changes are made, bond_option_mode_set() gets called and if >the mode changes to or from active-backup, we do need/want this code >to run to update wanted and features flags properly. > > >-- >Jarod Wilson >jarod@redhat.com --- -Jay Vosburgh, jay.vosburgh@canonical.com
On Wed, Dec 2, 2020 at 3:17 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote: > > Jarod Wilson <jarod@redhat.com> wrote: > > >On Wed, Dec 2, 2020 at 12:55 PM Jay Vosburgh <jay.vosburgh@canonical.com> wrote: > >> > >> Jarod Wilson <jarod@redhat.com> wrote: > >> > >> >Don't try to adjust XFRM support flags if the bond device isn't yet > >> >registered. Bad things can currently happen when netdev_change_features() > >> >is called without having wanted_features fully filled in yet. Basically, > >> >this code was racing against register_netdevice() filling in > >> >wanted_features, and when it got there first, the empty wanted_features > >> >led to features also getting emptied out, which was definitely not the > >> >intended behavior, so prevent that from happening. > >> > >> Is this an actual race? Reading Ivan's prior message, it sounds > >> like it's an ordering problem (in that bond_newlink calls > >> register_netdevice after bond_changelink). > > > >Sorry, yeah, this is not actually a race condition, just an ordering > >issue, bond_check_params() gets called at init time, which leads to > >bond_option_mode_set() being called, and does so prior to > >bond_create() running, which is where we actually call > >register_netdevice(). > > So this only happens if there's a "mode" module parameter? That > doesn't sound like the call path that Ivan described (coming in via > bond_newlink). Ah. I think there's actually two different pathways that can trigger this. The first is for bonds created at module load time, which I was describing, the second is for a new bond created via bond_newlink() after the bonding module is already loaded, as described by Ivan. Both have the problem of bond_option_mode_set() running prior to register_netdevice(). Of course, that would suggest every bond currently comes up with unintentionally neutered flags, which I neglected to catch in earlier testing and development.
diff --git a/drivers/net/bonding/bond_main.c b/drivers/net/bonding/bond_main.c index e0880a3840d7..5fe5232cc3f3 100644 --- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -4746,15 +4746,13 @@ void bond_setup(struct net_device *bond_dev) NETIF_F_HW_VLAN_CTAG_FILTER; bond_dev->hw_features |= NETIF_F_GSO_ENCAP_ALL; -#ifdef CONFIG_XFRM_OFFLOAD - bond_dev->hw_features |= BOND_XFRM_FEATURES; -#endif /* CONFIG_XFRM_OFFLOAD */ bond_dev->features |= bond_dev->hw_features; bond_dev->features |= NETIF_F_HW_VLAN_CTAG_TX | NETIF_F_HW_VLAN_STAG_TX; #ifdef CONFIG_XFRM_OFFLOAD - /* Disable XFRM features if this isn't an active-backup config */ - if (BOND_MODE(bond) != BOND_MODE_ACTIVEBACKUP) - bond_dev->features &= ~BOND_XFRM_FEATURES; + bond_dev->hw_features |= BOND_XFRM_FEATURES; + /* Only enable XFRM features if this is an active-backup config */ + if (BOND_MODE(bond) == BOND_MODE_ACTIVEBACKUP) + bond_dev->features |= BOND_XFRM_FEATURES; #endif /* CONFIG_XFRM_OFFLOAD */ } diff --git a/drivers/net/bonding/bond_options.c b/drivers/net/bonding/bond_options.c index 9abfaae1c6f7..19205cfac751 100644 --- a/drivers/net/bonding/bond_options.c +++ b/drivers/net/bonding/bond_options.c @@ -768,11 +768,15 @@ static int bond_option_mode_set(struct bonding *bond, bond->params.tlb_dynamic_lb = 1; #ifdef CONFIG_XFRM_OFFLOAD + if (bond->dev->reg_state != NETREG_REGISTERED) + goto noreg; + if (newval->value == BOND_MODE_ACTIVEBACKUP) bond->dev->wanted_features |= BOND_XFRM_FEATURES; else bond->dev->wanted_features &= ~BOND_XFRM_FEATURES; - netdev_change_features(bond->dev); + netdev_update_features(bond->dev); +noreg: #endif /* CONFIG_XFRM_OFFLOAD */ /* don't cache arp_validate between modes */
Don't try to adjust XFRM support flags if the bond device isn't yet registered. Bad things can currently happen when netdev_change_features() is called without having wanted_features fully filled in yet. Basically, this code was racing against register_netdevice() filling in wanted_features, and when it got there first, the empty wanted_features led to features also getting emptied out, which was definitely not the intended behavior, so prevent that from happening. Originally, I'd hoped to stop adjusting wanted_features at all in the bonding driver, as it's documented as being something only the network core should touch, but we actually do need to do this to properly update both the features and wanted_features fields when changing the bond type, or we get to a situation where ethtool sees: esp-hw-offload: off [requested on] I do think we should be using netdev_update_features instead of netdev_change_features here though, so we only send notifiers when the features actually changed. v2: rework based on further testing and suggestions from ivecera Fixes: a3b658cfb664 ("bonding: allow xfrm offload setup post-module-load") Reported-by: Ivan Vecera <ivecera@redhat.com> Suggested-by: Ivan Vecera <ivecera@redhat.com> Cc: Jay Vosburgh <j.vosburgh@gmail.com> Cc: Veaceslav Falico <vfalico@gmail.com> Cc: Andy Gospodarek <andy@greyhouse.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Thomas Davis <tadavis@lbl.gov> Cc: netdev@vger.kernel.org Signed-off-by: Jarod Wilson <jarod@redhat.com> --- drivers/net/bonding/bond_main.c | 10 ++++------ drivers/net/bonding/bond_options.c | 6 +++++- 2 files changed, 9 insertions(+), 7 deletions(-)