Message ID | 8845e49b-3165-e6df-5935-c86278d220d9@broadcom.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Kalle Valo |
Headers | show |
Hi everyone, 2017-08-10 14:43 GMT+02:00 Arend van Spriel <arend.vanspriel@broadcom.com>: > > > On 10-08-17 07:39, Kalle Valo wrote: >> >> Hi Mahesh and Andy, >> >> James Feeney reported that there's a serious regression in bonding >> module since v4.12, it doesn't work with wireless drivers anymore as >> wireless drivers don't report the link speed via ethtool: >> >> https://bugzilla.kernel.org/show_bug.cgi?id=196547 >> >> In the bug report it's said that this commit is the culprit: >> >> 3f3c278c94dd bonding: fix active-backup transition > > > This commit references another one. ie. commit c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state"). Before this commit the result of __ethtool_get_link_ksettings() was simply ignored. Actually it was not simply ignored in any case: Further down in bond_miimon_commit() there's a conditional call to bond_3ad_handle_link_change() which triggers an update using __get_link_speed() to actually access the result. A similar handler is also called for lb modes. > > Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setting the link state to the call sites of bond_update_speed_duplex(), just not all call sites. > >> Is there a fix for this or should that commit be reverted? This seems to >> be a serious regression as there are multiple reports already and we >> should get it fixed for v4.13, and the fix backported to v4.12 stable >> release. > > > The ethtool callbacks really seem optional. At least in brcmfmac, the wireless driver I maintain, I only provide get_drvinfo callback and there is no warning triggered upon registering the netdev. The changes above now require each netdev to implement the get_link_ksettings callback (get_settings is deprecated) or the link is marked as DOWN ruling it out to be used as active bond slave. To the end-users who were using bonding this is simply a regression. So to fix that both changes should be reverted in my opinion. Yes, also to me as user of a wireless slave in an active-backup bond it's clearly a regression. But only partially for some modes like active-backup since the bonding documentation [1] clearly lists as a prerequisite 1) for 802.3ad: "Ethtool support in the base drivers for retrieving the speed and duplex of each slave." 2) for tlb/alb: "Ethtool support in the base drivers for retrieving the speed of each slave." This was previously not directly enforced in the bonding code and thus probably occasionally caused unexpected behavior. At least such behavior is what to my understanding commit c4adfc822bf5 ("bonding: make speed, duplex setting consistent with link state") and 3f3c278c94dd ("bonding: fix active-backup transition") intend to fix with an apparent focus on 802.3ad. However those commits went too far by requiring a get_link_ksettings implementation by every slave driver REGARDLESS of the bond mode. Earlier today I submitted the patch (bonding: require speed/duplex only for 802.3ad, alb and tlb) [2] that only partially reverts what is a regression following my aforementioned logic. This seems to me like the best solution in the short term since it should satisfy both usergroups represented by Mahesh and James and restores consistence with the bonding documentation. James already commented approvingly on that patch in the bug report. [3] Regards Andreas [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt [2] https://www.spinics.net/lists/netdev/msg448662.html [3] https://bugzilla.kernel.org/show_bug.cgi?id=196547#c10
Andreas Born <futur.andy@googlemail.com> writes: > Earlier today I submitted the patch (bonding: require speed/duplex > only for 802.3ad, alb and tlb) [2] that only partially reverts what is > a regression following my aforementioned logic. This seems to me like > the best solution in the short term since it should satisfy both > usergroups represented by Mahesh and James and restores consistence > with the bonding documentation. James already commented approvingly on > that patch in the bug report. [3] > > Regards > Andreas > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt Great, thanks. I'll take it the patch is meant for net tree (and not net-next) so that it will be fixed for v4.13? Also it should backported to v4.12 stable tree. I don't see any mention of that in the patch submission and that's why I'm asking.
Kalle Valo <kvalo@codeaurora.org> writes: > Andreas Born <futur.andy@googlemail.com> writes: > >> Earlier today I submitted the patch (bonding: require speed/duplex >> only for 802.3ad, alb and tlb) [2] that only partially reverts what is >> a regression following my aforementioned logic. This seems to me like >> the best solution in the short term since it should satisfy both >> usergroups represented by Mahesh and James and restores consistence >> with the bonding documentation. James already commented approvingly on >> that patch in the bug report. [3] >> >> Regards >> Andreas >> >> [1] >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt > > Great, thanks. > > I'll take it the patch is meant for net tree (and not net-next) so that > it will be fixed for v4.13? Also it should backported to v4.12 stable > tree. I don't see any mention of that in the patch submission and that's > why I'm asking. I see that Dave applied this to the net tree and queued also for stable, excellent. Thanks everyone! https://patchwork.ozlabs.org/patch/800080/
Hey Kalle Still, a problem: On 08/12/2017 01:35 AM, Kalle Valo wrote: > Kalle Valo <kvalo@codeaurora.org> writes: > >> Andreas Born <futur.andy@googlemail.com> writes: >> >>> Earlier today I submitted the patch (bonding: require speed/duplex >>> only for 802.3ad, alb and tlb) [2] that only partially reverts what is >>> a regression following my aforementioned logic. This seems to me like >>> the best solution in the short term since it should satisfy both >>> usergroups represented by Mahesh and James and restores consistence >>> with the bonding documentation. James already commented approvingly on >>> that patch in the bug report. [3] >>> >>> Regards >>> Andreas >>> >>> [1] >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/networking/bonding.txt >> >> Great, thanks. >> >> I'll take it the patch is meant for net tree (and not net-next) so that >> it will be fixed for v4.13? Also it should backported to v4.12 stable >> tree. I don't see any mention of that in the patch submission and that's >> why I'm asking. > > I see that Dave applied this to the net tree and queued also for stable, > excellent. Thanks everyone! > > https://patchwork.ozlabs.org/patch/800080/ > Andreas patch failed to address the continuous, *10-times per second* warning which will "spam" the log file, sometimes the console, whenever the test fails: if (bond_update_speed_duplex(slave) && bond_needs_speed_duplex(bond)) {...} which then has: netdev_warn(bond->dev, "failed to get link speed/duplex for%s\n", slave->dev->name); That is the sort of irresponsible code that "works fine" as long as there are no errors, and it never actually runs! I'm guessing that the simple fix is to use "net_warn_ratelimited()" instead of "netdev_warn()", where net/core/utils.c says: /* * All net warning printk()s should be guarded by this function. */ int net_ratelimit(void) { return __ratelimit(&net_ratelimit_state); } though Andreas has also suggested "pr_warn_ratelimited()", which instead uses "__ratelimit(&_rs)". Here's a link to the rate-limiting patch, after Andreas' patch is already applied, since you say that David has already applied the first patch: https://bugzilla.kernel.org/attachment.cgi?id=257903 Thanks James
2017-08-12 21:30 GMT+02:00 James Feeney <james@nurealm.net>: > > > Andreas patch failed to address the continuous, *10-times per second* warning > which will "spam" the log file, sometimes the console, whenever the test fails: > if (bond_update_speed_duplex(slave) && bond_needs_speed_duplex(bond)) {...} > which then has: > netdev_warn(bond->dev, "failed to get link speed/duplex for%s\n", > slave->dev->name); > > That is the sort of irresponsible code that "works fine" as long as there are no > errors, and it never actually runs! > > I'm guessing that the simple fix is to use "net_warn_ratelimited()" instead of > "netdev_warn()", where net/core/utils.c says: > > /* > * All net warning printk()s should be guarded by this function. > */ > int net_ratelimit(void) > { > return __ratelimit(&net_ratelimit_state); > } > > though Andreas has also suggested "pr_warn_ratelimited()", which instead uses > "__ratelimit(&_rs)". > > Here's a link to the rate-limiting patch, after Andreas' patch is already > applied, since you say that David has already applied the first patch: > > https://bugzilla.kernel.org/attachment.cgi?id=257903 > The other day I also sent a patch for part 2 of your bug report on this list. Better safe than sorry by doing one thing per patch. This second patch received feedback and its v2 is currently awaiting review. Sorry, for informing you only now. I should have cc'd you in the first place. You can lookup the state of that patch on patchwork. [1] It's basically the same as previous versions just changing the original code even less. . I sort of expect that this second patch won't be queued for stable. But let's see... Essentially it's a regression too and additionally would've been fixed by a plain revert. On a side note I would recommend some of my own reading to you about patch submission in general [1] and on netdev specifically [2]. Putting your suggested changes in the right form makes everyone's life easier especially your own saving you lots of back and forth by mail. Seeing the amount of mail on this list during the last days was reason enough to comprehend the necessity of those standards. And, just wondering, who's going to eventually close that bugreport? https://bugzilla.kernel.org/show_bug.cgi?id=196547 Cheers Andreas [1] https://patchwork.ozlabs.org/patch/800770/ [2] https://www.kernel.org/doc/html/latest/process/submitting-patches.html [3] https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/tree/Documentation/networking/netdev-FAQ.txt
On 08/13/2017 11:42 AM, Andreas Born wrote: > On a side note I would recommend some of my own reading to you about > patch submission in general [1] and on netdev specifically [2]. Mmm - [2] and [3], I suspect. Thanks Andreas. I'll be studying those. Yeah, I'm still learning what is needed and what works. Sometimes, just a note to the author is more than enough to resolve a problem. Sometimes, discussion is needed. And other times... well, certain people are infamous... but no problem here, thankfully. > And, just wondering, who's going to eventually close that bugreport? > https://bugzilla.kernel.org/show_bug.cgi?id=196547 I can close it when the patches actually land in the kernel. I'm glad to see that there was an "Ack" from Mahesh. On the topic of wireless support for kernel ethtool reporting, I'm wondering, is there is any consensus about that? And, for instance, is there any *other* way for the bonding module to make "better link" decisions for wireless links? As "wireless" becomes more capable, possibly more diverse, and probably more essential for computing, this is likely to become a bigger issue. Ben Greear mentioned that he had added some support to the ath10k driver. Dan Williams mentioned the possibility of updating the mac80211 stack for support. And Arend van Spriel suggested that the issue might best be left for the next Netconf. Immediate problem solved, but maybe a larger issue still needs to be addressed? James
From: James Feeney <james@nurealm.net> Date: Wed, 16 Aug 2017 14:44:27 -0600 > On 08/13/2017 11:42 AM, Andreas Born wrote: >> On a side note I would recommend some of my own reading to you about >> patch submission in general [1] and on netdev specifically [2]. > > Mmm - [2] and [3], I suspect. Thanks Andreas. I'll be studying those. Yeah, > I'm still learning what is needed and what works. Sometimes, just a note to the > author is more than enough to resolve a problem. Sometimes, discussion is > needed. And other times... well, certain people are infamous... but no problem > here, thankfully. > >> And, just wondering, who's going to eventually close that bugreport? >> https://bugzilla.kernel.org/show_bug.cgi?id=196547 > > I can close it when the patches actually land in the kernel. All of the patches are in Linus's tree.
On Wed, 2017-08-16 at 14:44 -0600, James Feeney wrote: > On 08/13/2017 11:42 AM, Andreas Born wrote: > > On a side note I would recommend some of my own reading to you > > about > > patch submission in general [1] and on netdev specifically [2]. > > Mmm - [2] and [3], I suspect. Thanks Andreas. I'll be studying > those. Yeah, > I'm still learning what is needed and what works. Sometimes, just a > note to the > author is more than enough to resolve a problem. Sometimes, > discussion is > needed. And other times... well, certain people are infamous... but > no problem > here, thankfully. > > > And, just wondering, who's going to eventually close that > > bugreport? > > https://bugzilla.kernel.org/show_bug.cgi?id=196547 > > I can close it when the patches actually land in the kernel. I'm > glad to see > that there was an "Ack" from Mahesh. > > On the topic of wireless support for kernel ethtool reporting, I'm > wondering, is > there is any consensus about that? > > And, for instance, is there any *other* way for the bonding module to > make > "better link" decisions for wireless links? As "wireless" becomes > more capable, > possibly more diverse, and probably more essential for computing, > this is likely > to become a bigger issue. > > Ben Greear mentioned that he had added some support to the ath10k > driver. Dan > Williams mentioned the possibility of updating the mac80211 stack for > support. > And Arend van Spriel suggested that the issue might best be left for > the next > Netconf. > > Immediate problem solved, but maybe a larger issue still needs to be > addressed? Again, it's technically possible to add the link settings support to wireless drivers. But the issue is around what bonding would do with that information in its various modes. My biggest suggestion is that perhaps bonding should grow hysteresis for link speeds. Since WiFi can change speed every packet, you probably don't want the bond characteristics changing every couple seconds just in case your WiFi link is jumping around. Ethernet won't bounce around that much, so the hysteresis would have no effect there. Or, if people are concerned about response time to speed changes on ethernet (where you probably do want an instant switch-over) some new flag to indicate that certain devices don't have stable speeds over time. Dan
From: Dan Williams <dcbw@redhat.com> Date: Wed, 16 Aug 2017 16:22:41 -0500 > My biggest suggestion is that perhaps bonding should grow hysteresis > for link speeds. Since WiFi can change speed every packet, you probably > don't want the bond characteristics changing every couple seconds just > in case your WiFi link is jumping around. Ethernet won't bounce around > that much, so the hysteresis would have no effect there. Or, if people > are concerned about response time to speed changes on ethernet (where > you probably do want an instant switch-over) some new flag to indicate > that certain devices don't have stable speeds over time. Or just report the average of the range the wireless link can hit, and be done with it. I think you guys are overcomplicating things.
On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote: > From: Dan Williams <dcbw@redhat.com> > Date: Wed, 16 Aug 2017 16:22:41 -0500 > > > My biggest suggestion is that perhaps bonding should grow > hysteresis > > for link speeds. Since WiFi can change speed every packet, you > probably > > don't want the bond characteristics changing every couple seconds > just > > in case your WiFi link is jumping around. Ethernet won't bounce > around > > that much, so the hysteresis would have no effect there. Or, if > people > > are concerned about response time to speed changes on ethernet > (where > > you probably do want an instant switch-over) some new flag to > indicate > > that certain devices don't have stable speeds over time. > > Or just report the average of the range the wireless link can hit, > and > be done with it. > > I think you guys are overcomplicating things. That range can be from 1 to > 800Mb/s. No, it won't usually be all over that range, but it won't be uncommon to fluctuate by hundreds of Mb/s. I'm not sure a simple average is really the answer here. Even doing that would require new knobs to ethtool, since the rate depends heavily on card capabilities and also what AP you're connected to *at that moment*. If you roam to another AP, then the max speed can certainly change. You'll probably say "aim for the 75% case" or something like that, which is fine, but then you're depending on your 75% case to be (a) single AP, (b) never move (eg, only bond wifi + ethernet), (c) little radio interference. I'm not sure I'd buy that. If I've put words in your mouth, forgive me. Dan
On 08/16/2017 07:11 PM, Dan Williams wrote: > On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote: >> From: Dan Williams <dcbw@redhat.com> >> Date: Wed, 16 Aug 2017 16:22:41 -0500 >> >>> My biggest suggestion is that perhaps bonding should grow >> hysteresis >>> for link speeds. Since WiFi can change speed every packet, you >> probably >>> don't want the bond characteristics changing every couple seconds >> just >>> in case your WiFi link is jumping around. Ethernet won't bounce >> around >>> that much, so the hysteresis would have no effect there. Or, if >> people >>> are concerned about response time to speed changes on ethernet >> (where >>> you probably do want an instant switch-over) some new flag to >> indicate >>> that certain devices don't have stable speeds over time. >> >> Or just report the average of the range the wireless link can hit, >> and >> be done with it. >> >> I think you guys are overcomplicating things. > > That range can be from 1 to > 800Mb/s. No, it won't usually be all > over that range, but it won't be uncommon to fluctuate by hundreds of > Mb/s. I'm not sure a simple average is really the answer here. Even > doing that would require new knobs to ethtool, since the rate depends > heavily on card capabilities and also what AP you're connected to *at > that moment*. If you roam to another AP, then the max speed can > certainly change. > > You'll probably say "aim for the 75% case" or something like that, > which is fine, but then you're depending on your 75% case to be (a) > single AP, (b) never move (eg, only bond wifi + ethernet), (c) little > radio interference. I'm not sure I'd buy that. If I've put words in > your mouth, forgive me. If you keep ethtool API simple and just return the last (rx-rate + tx-rate) / 2, or the rate averaged over the last 100 frames or 10 seconds, then the caller can do longer term averaging as it sees fit. Probably no need for lots of averaging complexity in the kernel. rate-ctrl for wifi basically doesn't happen until you transmit or receive a fairly steady stream, so it will fluctuate a lot. Thanks, Ben > > Dan >
From: Dan Williams <dcbw@redhat.com> Date: Wed, 16 Aug 2017 21:11:47 -0500 > You'll probably say "aim for the 75% case" or something like that, > which is fine, but then you're depending on your 75% case to be (a) > single AP, (b) never move (eg, only bond wifi + ethernet), (c) little > radio interference. I'm not sure I'd buy that. If I've put words in > your mouth, forgive me. You can interpret what I'm saying as "some reasonable value is better than no value at all." You can keep arguing about AP changes, radio interference, etc. but anything is better than the current situation. Start small and simple, try to handle everything over time and gradually.
On Wed, 2017-08-16 at 19:36 -0700, Ben Greear wrote: > On 08/16/2017 07:11 PM, Dan Williams wrote: > > On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote: > > > From: Dan Williams <dcbw@redhat.com> > > > Date: Wed, 16 Aug 2017 16:22:41 -0500 > > > > > > > My biggest suggestion is that perhaps bonding should grow > > > > > > hysteresis > > > > for link speeds. Since WiFi can change speed every packet, you > > > > > > probably > > > > don't want the bond characteristics changing every couple > > > > seconds > > > > > > just > > > > in case your WiFi link is jumping around. Ethernet won't > > > > bounce > > > > > > around > > > > that much, so the hysteresis would have no effect there. Or, > > > > if > > > > > > people > > > > are concerned about response time to speed changes on ethernet > > > > > > (where > > > > you probably do want an instant switch-over) some new flag to > > > > > > indicate > > > > that certain devices don't have stable speeds over time. > > > > > > Or just report the average of the range the wireless link can > > > hit, > > > and > > > be done with it. > > > > > > I think you guys are overcomplicating things. > > > > That range can be from 1 to > 800Mb/s. No, it won't usually be all > > over that range, but it won't be uncommon to fluctuate by hundreds > > of > > Mb/s. I'm not sure a simple average is really the answer > > here. Even > > doing that would require new knobs to ethtool, since the rate > > depends > > heavily on card capabilities and also what AP you're connected to > > *at > > that moment*. If you roam to another AP, then the max speed can > > certainly change. > > > > You'll probably say "aim for the 75% case" or something like that, > > which is fine, but then you're depending on your 75% case to be (a) > > single AP, (b) never move (eg, only bond wifi + ethernet), (c) > > little > > radio interference. I'm not sure I'd buy that. If I've put words > > in > > your mouth, forgive me. > > If you keep ethtool API simple and just return the last (rx-rate + > tx-rate) / 2, or the rate averaged > over the last 100 frames or 10 seconds, then the caller can do longer > term averaging > as it sees fit. Probably no need for lots of averaging complexity in > the kernel. Yeah, that works too, but I was thinking it was better to present the actual data through ethtool so that things other than bonding could use it, and since bonding is the thing that actually cares about the fluctuation, make it do the more extensive processing. Dan > rate-ctrl for wifi basically doesn't happen until you transmit or > receive a > fairly steady stream, so it will fluctuate a lot.
On 08/16/2017 08:18 PM, Dan Williams wrote: > On Wed, 2017-08-16 at 19:36 -0700, Ben Greear wrote: >> On 08/16/2017 07:11 PM, Dan Williams wrote: >>> On Wed, 2017-08-16 at 14:31 -0700, David Miller wrote: >>>> From: Dan Williams <dcbw@redhat.com> >>>> Date: Wed, 16 Aug 2017 16:22:41 -0500 >>>> >>>>> My biggest suggestion is that perhaps bonding should grow >>>> >>>> hysteresis >>>>> for link speeds. Since WiFi can change speed every packet, you >>>> >>>> probably >>>>> don't want the bond characteristics changing every couple >>>>> seconds >>>> >>>> just >>>>> in case your WiFi link is jumping around. Ethernet won't >>>>> bounce >>>> >>>> around >>>>> that much, so the hysteresis would have no effect there. Or, >>>>> if >>>> >>>> people >>>>> are concerned about response time to speed changes on ethernet >>>> >>>> (where >>>>> you probably do want an instant switch-over) some new flag to >>>> >>>> indicate >>>>> that certain devices don't have stable speeds over time. >>>> >>>> Or just report the average of the range the wireless link can >>>> hit, >>>> and >>>> be done with it. >>>> >>>> I think you guys are overcomplicating things. >>> >>> That range can be from 1 to > 800Mb/s. No, it won't usually be all >>> over that range, but it won't be uncommon to fluctuate by hundreds >>> of >>> Mb/s. I'm not sure a simple average is really the answer >>> here. Even >>> doing that would require new knobs to ethtool, since the rate >>> depends >>> heavily on card capabilities and also what AP you're connected to >>> *at >>> that moment*. If you roam to another AP, then the max speed can >>> certainly change. >>> >>> You'll probably say "aim for the 75% case" or something like that, >>> which is fine, but then you're depending on your 75% case to be (a) >>> single AP, (b) never move (eg, only bond wifi + ethernet), (c) >>> little >>> radio interference. I'm not sure I'd buy that. If I've put words >>> in >>> your mouth, forgive me. >> >> If you keep ethtool API simple and just return the last (rx-rate + >> tx-rate) / 2, or the rate averaged >> over the last 100 frames or 10 seconds, then the caller can do longer >> term averaging >> as it sees fit. Probably no need for lots of averaging complexity in >> the kernel. > > Yeah, that works too, but I was thinking it was better to present the > actual data through ethtool so that things other than bonding could use > it, and since bonding is the thing that actually cares about the > fluctuation, make it do the more extensive processing. What do you mean by 'actual data'? If you want to know the most accurate transmit/rx rate info, then you need to pay attention to each and every frame's tx/rx rate, as well as it's ampdu/amsdu, retries, etc. It is virtually impossible. So, you will have to settle for something less... I suggest something simple to calculate, similar to existing values that are available via debugfs and/or 'iw dev foo station dump', etc. Let higher layers manipulate the raw data as they see fit (they can query ethtool as often as they like). Thanks, Ben
Dan Williams <dcbw@redhat.com> wrote: [...] >You'll probably say "aim for the 75% case" or something like that, >which is fine, but then you're depending on your 75% case to be (a) >single AP, (b) never move (eg, only bond wifi + ethernet), (c) little >radio interference. I'm not sure I'd buy that. If I've put words in >your mouth, forgive me. The primary use case that I'm aware of for bonding with wireless devices is in active-backup mode, paired with an ethernet adapter set as the bonding "primary" device. I think this is the case that absolutely should just work. I don't think bonding cares (or should care) about (a) - (c) for this use. Your point (b) suggests that there are use cases other than the above; I'm unfamiliar with any use other than wifi + ethernet, can you elaborate? -J --- -Jay Vosburgh, jay.vosburgh@canonical.com
--- a/drivers/net/bonding/bond_main.c +++ b/drivers/net/bonding/bond_main.c @@ -365,9 +365,10 @@ int bond_set_carrier(struct bonding *bond) /* Get link speed and duplex from the slave's base driver * using ethtool. If for some reason the call fails or the * values are invalid, set speed and duplex to -1, - * and return. + * and return. Return 1 if speed or duplex settings are + * UNKNOWN; 0 otherwise. */ -static void bond_update_speed_duplex(struct slave *slave) +static int bond_update_speed_duplex(struct slave *slave) { struct net_device *slave_dev = slave->dev; struct ethtool_link_ksettings ecmd; @@ -377,24 +378,27 @@ static void bond_update_speed_duplex(struct slave *slave) slave->duplex = DUPLEX_UNKNOWN; res = __ethtool_get_link_ksettings(slave_dev, &ecmd); - if (res < 0) - return; - - if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) - return; - + if (res < 0) { + slave->link = BOND_LINK_DOWN; + return 1; + } + if (ecmd.base.speed == 0 || ecmd.base.speed == ((__u32)-1)) { + slave->link = BOND_LINK_DOWN; + return 1; + } Commit 3f3c278c94dd ("bonding: fix active-backup transition") moves setting the link state to the call sites of bond_update_speed_duplex(), just not all call sites. > Is there a fix for this or should that commit be reverted? This seems to > be a serious regression as there are multiple reports already and we > should get it fixed for v4.13, and the fix backported to v4.12 stable > release. The ethtool callbacks really seem optional. At least in brcmfmac, the wireless driver I maintain, I only provide get_drvinfo callback and there is no warning triggered upon registering the netdev. The changes above now require each netdev to implement the get_link_ksettings callback (get_settings is deprecated) or the link is marked as DOWN