Message ID | 20200525165317.2269-1-greearb@candelatech.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
Series | mac80211: do not iterate active interfaces when in re-configure | expand |
On Mon, 2020-05-25 at 09:53 -0700, greearb@candelatech.com wrote: > From: Ben Greear <greearb@candelatech.com> > > This appears to fix a problem where ath10k firmware would crash, "appears to", heh Really though, in general, you need to start thinking about mac80211 and the drivers as two separate things. You've also submitted another patch where you say "this iwlwifi thing requires mac80211 to change", and here you're submitting a patch saying "this ath10k thing requires mac80211 to change", but I don't see you considering much the fact that mac80211 is actually used for both. It'd be good to have a discussion of such things in the commit log for changes that will affect multiple drivers, rather than focusing on a single bug for a single driver. In general, not just in this patch. > diff --git a/net/mac80211/util.c b/net/mac80211/util.c > index 5db2cd0..186a696 100644 > --- a/net/mac80211/util.c > +++ b/net/mac80211/util.c > @@ -831,7 +831,7 @@ static void __iterate_interfaces(struct ieee80211_local *local, > break; > } > if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && > - active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) > + (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)))) > continue; Anyway, this seems wrong to me. If anything, it should skip those interfaces that weren't re-added yet, but not all of them. I'm pretty sure this would cause iwlwifi to misbehave with multiple interfaces, as it sometimes relies on iterating to understand what else is going on before configuring something. It might even be that this can only be done subject to driver choice. johannes
On 7/30/20 4:48 AM, Johannes Berg wrote: > On Mon, 2020-05-25 at 09:53 -0700, greearb@candelatech.com wrote: >> From: Ben Greear <greearb@candelatech.com> >> >> This appears to fix a problem where ath10k firmware would crash, > > > "appears to", heh > > Really though, in general, you need to start thinking about mac80211 and > the drivers as two separate things. You've also submitted another patch > where you say "this iwlwifi thing requires mac80211 to change", and here > you're submitting a patch saying "this ath10k thing requires mac80211 to > change", but I don't see you considering much the fact that mac80211 is > actually used for both. > > It'd be good to have a discussion of such things in the commit log for > changes that will affect multiple drivers, rather than focusing on a > single bug for a single driver. In general, not just in this patch. > >> diff --git a/net/mac80211/util.c b/net/mac80211/util.c >> index 5db2cd0..186a696 100644 >> --- a/net/mac80211/util.c >> +++ b/net/mac80211/util.c >> @@ -831,7 +831,7 @@ static void __iterate_interfaces(struct ieee80211_local *local, >> break; >> } >> if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && >> - active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) >> + (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)))) >> continue; > > Anyway, this seems wrong to me. If anything, it should skip those > interfaces that weren't re-added yet, but not all of them. I'm pretty > sure this would cause iwlwifi to misbehave with multiple interfaces, as > it sometimes relies on iterating to understand what else is going on > before configuring something. > > It might even be that this can only be done subject to driver choice. I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and 64 station vifs on ath10k radios, probably way harder than anyone else is testing this sort of thing. Possibly you are correct about iwlwifi, I've never tested it with multi-interface, and as you see, have had bad luck on ax200. If you'd accept a patch with a new driver flag check (which I can enable for ath10k and ath9k), then I'll respin it thus. Thanks, Ben
On Thu, 2020-07-30 at 06:05 -0700, Ben Greear wrote: > > It might even be that this can only be done subject to driver choice. > > I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and > 64 station vifs on ath10k radios, probably way harder than anyone else is testing > this sort of thing. Yeah, I'm sure! > Possibly you are correct about iwlwifi, I've never tested it with multi-interface, > and as you see, have had bad luck on ax200. Right. > If you'd accept a patch with a new driver flag check (which I can enable for > ath10k and ath9k), then I'll respin it thus. My order of preference would be something like 1. track per vif whether it was re-added, and skip before it is If that works, I can certainly get behind it for semantic reasons (the vif isn't yet active again), although even there I'm not sure how iwlwifi would behave - but that's something I'd look into and perhaps even consider a bug there since it shouldn't know about that interface yet. 2. If for some reason that doesn't work, add an iteration flag that controls this, rather than a per-device config? johannes
On 7/30/20 6:13 AM, Johannes Berg wrote: > On Thu, 2020-07-30 at 06:05 -0700, Ben Greear wrote: > >>> It might even be that this can only be done subject to driver choice. >> >> I have tested this patch hard for many years with hundreds of station vifs on ath9k radios and >> 64 station vifs on ath10k radios, probably way harder than anyone else is testing >> this sort of thing. > > Yeah, I'm sure! > >> Possibly you are correct about iwlwifi, I've never tested it with multi-interface, >> and as you see, have had bad luck on ax200. > > Right. > >> If you'd accept a patch with a new driver flag check (which I can enable for >> ath10k and ath9k), then I'll respin it thus. > > My order of preference would be something like > > 1. track per vif whether it was re-added, and skip before it is > > If that works, I can certainly get behind it for semantic reasons (the > vif isn't yet active again), although even there I'm not sure how > iwlwifi would behave - but that's something I'd look into and perhaps > even consider a bug there since it shouldn't know about that interface > yet. Wouldn't that complicate the sdata-in-driver check even more? So it would be something like is-in-driver-but-not-yet-reconfigured-in-driver? This sort of state is quite nasty in my experience. Almost better if we had less state. Driver should already know if it has an object or not, so redundant adds, or requests from mac80211 for objects the driver does not have can be handled properly by the driver? > > 2. If for some reason that doesn't work, add an iteration flag that > controls this, rather than a per-device config? I wrote this patch probably 5 or so years ago, and since I have fixed most of the ath10k firmware crashes that would tend to trigger this bug. I think I have no good way to test any complicated change to this patch. I am quite certain that ath9k and ath10k are at least not harmed by this patch, and certainly old ath10k benefited from this. So, I'm comfortable adding a driver level flag enabled for those two drivers. Changing all the calling locations to (maybe) add a flag or adding and tracking vif state is too risky to be worth this change I think. Thanks, Ben > > johannes >
On Thu, 2020-07-30 at 06:27 -0700, Ben Greear wrote: > > 1. track per vif whether it was re-added, and skip before it is > > > > If that works, I can certainly get behind it for semantic reasons (the > > vif isn't yet active again), although even there I'm not sure how > > iwlwifi would behave - but that's something I'd look into and perhaps > > even consider a bug there since it shouldn't know about that interface > > yet. > > Wouldn't that complicate the sdata-in-driver check even more? So it would > be something like is-in-driver-but-not-yet-reconfigured-in-driver? We should probably just clear the "is-in-driver" flag for the most part, and just remember "was-in-driver" so we know which ones to reconfigure, or something like that? > This sort of state is quite nasty in my experience. Almost better if > we had less state. Driver should already know if it has an object or not, > so redundant adds, or requests from mac80211 for objects the driver does not > have can be handled properly by the driver? I don't think that'll work. Drivers just act on "add" and "remove", they're not checking against double-add. And IMHO it makes more sense to have mac80211 do good sequencing than the throw our hands in the air and let the drivers have to be idempotent just because we can't figure out the right sequencing? > > 2. If for some reason that doesn't work, add an iteration flag that > > controls this, rather than a per-device config? > > I wrote this patch probably 5 or so years ago, and since I have fixed most of > the ath10k firmware crashes that would tend to trigger this bug. I think I have no > good way to test any complicated change to this patch. > > I am quite certain that ath9k and ath10k are at least not harmed by this patch, and > certainly old ath10k benefited from this. So, I'm comfortable adding a driver > level flag enabled for those two drivers. Changing all the calling locations to > (maybe) add a flag or adding and tracking vif state is too risky to be worth this change I > think. Uh, why? I mean, at least mechanically replacing all the callers in that driver wouldn't really be any different than adding a driver flag, but is so much more flexible and can be used elsewhere. I don't think I buy this argument much really. Yes, I understand that there's always resistance to changing something that works, but ... Really I think the right thing to do here would be 1., and let mac80211 sort out the sequencing. Consider add interface wlan0 add interface wlan1 iterate active interfaces -> wlan0 wlan1 add interface wlan2 iterate active interfaces -> wlan0 wlan1 wlan2 If you apply this scenario to a restart, which ought to be functionally equivalent to the normal startup, just compressed in time, you're basically saying that today you get add interface wlan0 add interface wlan1 iterate active interfaces -> wlan0 wlan1 wlan2 << problem here add interface wlan2 iterate active interfaces -> wlan0 wlan1 wlan2 which yeah, totally seems wrong. But fixing that to be add interface wlan0 add interface wlan1 iterate active interfaces -> <nothing> add interface wlan2 iterate active interfaces -> <nothing> (or maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) seems equally wrong? johannes
On 7/30/20 6:41 AM, Johannes Berg wrote: > On Thu, 2020-07-30 at 06:27 -0700, Ben Greear wrote: > >>> 1. track per vif whether it was re-added, and skip before it is >>> >>> If that works, I can certainly get behind it for semantic reasons (the >>> vif isn't yet active again), although even there I'm not sure how >>> iwlwifi would behave - but that's something I'd look into and perhaps >>> even consider a bug there since it shouldn't know about that interface >>> yet. >> >> Wouldn't that complicate the sdata-in-driver check even more? So it would >> be something like is-in-driver-but-not-yet-reconfigured-in-driver? > > We should probably just clear the "is-in-driver" flag for the most part, > and just remember "was-in-driver" so we know which ones to reconfigure, > or something like that? > >> This sort of state is quite nasty in my experience. Almost better if >> we had less state. Driver should already know if it has an object or not, >> so redundant adds, or requests from mac80211 for objects the driver does not >> have can be handled properly by the driver? > > I don't think that'll work. Drivers just act on "add" and "remove", > they're not checking against double-add. And IMHO it makes more sense to > have mac80211 do good sequencing than the throw our hands in the air and > let the drivers have to be idempotent just because we can't figure out > the right sequencing? They certainly could check. At the least, removing something that is already gone should be an easy check. That would at least let mac80211 be more confident about cleanup. The current in-kernel code that cleared SDATA_IN_DRIVER in the non-recoverable firmware crash bit just assumed that drivers would clean things up, but I'm not even sure they can entirely clean things up since often the objects in their local memory are held by mac80211 objects (ie, the priv opaque objects). >>> 2. If for some reason that doesn't work, add an iteration flag that >>> controls this, rather than a per-device config? >> >> I wrote this patch probably 5 or so years ago, and since I have fixed most of >> the ath10k firmware crashes that would tend to trigger this bug. I think I have no >> good way to test any complicated change to this patch. >> >> I am quite certain that ath9k and ath10k are at least not harmed by this patch, and >> certainly old ath10k benefited from this. So, I'm comfortable adding a driver >> level flag enabled for those two drivers. Changing all the calling locations to >> (maybe) add a flag or adding and tracking vif state is too risky to be worth this change I >> think. > > Uh, why? I mean, at least mechanically replacing all the callers in that > driver wouldn't really be any different than adding a driver flag, but > is so much more flexible and can be used elsewhere. I don't think I buy > this argument much really. > > Yes, I understand that there's always resistance to changing something > that works, but ... > > Really I think the right thing to do here would be 1., and let mac80211 > sort out the sequencing. > > Consider > > add interface wlan0 > add interface wlan1 > iterate active interfaces -> wlan0 wlan1 > add interface wlan2 > iterate active interfaces -> wlan0 wlan1 wlan2 > > If you apply this scenario to a restart, which ought to be functionally > equivalent to the normal startup, just compressed in time, you're > basically saying that today you get > > add interface wlan0 > add interface wlan1 > iterate active interfaces -> wlan0 wlan1 wlan2 << problem here > add interface wlan2 > iterate active interfaces -> wlan0 wlan1 wlan2 > > which yeah, totally seems wrong. > > But fixing that to be > > add interface wlan0 > add interface wlan1 > iterate active interfaces -> > <nothing> > add interface wlan2 > iterate active interfaces -> <nothing> > (or > maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) > > seems equally wrong? So, looks like there is a flags option passed to the iterate logic, and it is indeed called directly from drivers. So, I could just add a new flag value, and | it in when calling from ath10k. I'm not sure it would really solve the second case, but at least in practice, that one doesn't seem to be a problem with ath10k, and the first case *was* a problem. If that sounds OK to you, I'll work on the patch as described. Thanks, Ben > > johannes >
On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote: > > Consider > > > > add interface wlan0 > > add interface wlan1 > > iterate active interfaces -> wlan0 wlan1 > > add interface wlan2 > > iterate active interfaces -> wlan0 wlan1 wlan2 > > > > If you apply this scenario to a restart, which ought to be functionally > > equivalent to the normal startup, just compressed in time, you're > > basically saying that today you get > > > > add interface wlan0 > > add interface wlan1 > > iterate active interfaces -> wlan0 wlan1 wlan2 << problem here > > add interface wlan2 > > iterate active interfaces -> wlan0 wlan1 wlan2 > > > > which yeah, totally seems wrong. > > > > But fixing that to be > > > > add interface wlan0 > > add interface wlan1 > > iterate active interfaces -> > > <nothing> > > add interface wlan2 > > iterate active interfaces -> <nothing> > > (or > > maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) > > > > seems equally wrong? > > So, looks like there is a flags option passed to the iterate logic, and it is indeed called > directly from drivers. So, I could just add a new flag value, and | it in when calling from ath10k. > > I'm not sure it would really solve the second case, but at least in practice, > that one doesn't seem to be a problem with ath10k, and the first case *was* > a problem. > > If that sounds OK to you, I'll work on the patch as described. Right, that'd be the option 2. I described earlier. I can live with that even if I'd prefer to fix it as per 1. to "make sense". But I guess there could even be "more legitimate" cases to not want to iterate while restarting, even if I'm not really sure where that'd make sense? I guess Kalle should comment on whether he'd accept that into the driver. Kalle, as you can see above mac80211 appears to be broken wrt. iterating "active" interfaces during a restart - the iteration considers all interfaces active that were active before the restart, not just the ones that were already re-added to the driver. Ben says this causes trouble in ath10k. IMHO the right fix for this would be to fix the iteration to only reach the ones that have been re-added, like I've said above. OTOH, Ben isn't really convinced that that's right, and has experience with a patch that makes mac80211 return *no* interfaces whatsoever in the iteration when done while in restart. Like I say there, it seems wrong to me. But depending on what ath10k actually _does_ with this list, perhaps it's not an issue. Perhaps it's just transient state that it derives from it, so if it does it again after the reconfig is completed, it would in fact get all the information it needed. I'm pretty sure this would break iwlwifi, so one option (less preferred) would be to add a flag to say "skip iteration in reconfig". actually does the driver know it's in reconfig? Perhaps it could even do that completely on its own? Anyway, the question is what you think about doing such a thing in the driver, if it fixes issues even if it's probably not really correct. johannes
On 7/30/20 8:03 AM, Johannes Berg wrote: > On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote: > >>> Consider >>> >>> add interface wlan0 >>> add interface wlan1 >>> iterate active interfaces -> wlan0 wlan1 >>> add interface wlan2 >>> iterate active interfaces -> wlan0 wlan1 wlan2 >>> >>> If you apply this scenario to a restart, which ought to be functionally >>> equivalent to the normal startup, just compressed in time, you're >>> basically saying that today you get >>> >>> add interface wlan0 >>> add interface wlan1 >>> iterate active interfaces -> wlan0 wlan1 wlan2 << problem here >>> add interface wlan2 >>> iterate active interfaces -> wlan0 wlan1 wlan2 >>> >>> which yeah, totally seems wrong. >>> >>> But fixing that to be >>> >>> add interface wlan0 >>> add interface wlan1 >>> iterate active interfaces -> >>> <nothing> >>> add interface wlan2 >>> iterate active interfaces -> <nothing> >>> (or >>> maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) >>> >>> seems equally wrong? >> >> So, looks like there is a flags option passed to the iterate logic, and it is indeed called >> directly from drivers. So, I could just add a new flag value, and | it in when calling from ath10k. >> >> I'm not sure it would really solve the second case, but at least in practice, >> that one doesn't seem to be a problem with ath10k, and the first case *was* >> a problem. >> >> If that sounds OK to you, I'll work on the patch as described. > > Right, that'd be the option 2. I described earlier. I can live with that > even if I'd prefer to fix it as per 1. to "make sense". But I guess > there could even be "more legitimate" cases to not want to iterate while > restarting, even if I'm not really sure where that'd make sense? > > I guess Kalle should comment on whether he'd accept that into the > driver. > > Kalle, as you can see above mac80211 appears to be broken wrt. iterating > "active" interfaces during a restart - the iteration considers all > interfaces active that were active before the restart, not just the ones > that were already re-added to the driver. Ben says this causes trouble > in ath10k. > > IMHO the right fix for this would be to fix the iteration to only reach > the ones that have been re-added, like I've said above. OTOH, Ben isn't > really convinced that that's right, and has experience with a patch that > makes mac80211 return *no* interfaces whatsoever in the iteration when > done while in restart. Like I say there, it seems wrong to me. > > But depending on what ath10k actually _does_ with this list, perhaps > it's not an issue. Perhaps it's just transient state that it derives > from it, so if it does it again after the reconfig is completed, it > would in fact get all the information it needed. > > I'm pretty sure this would break iwlwifi, so one option (less preferred) > would be to add a flag to say "skip iteration in reconfig". > > actually does the driver know it's in reconfig? Perhaps it could even do > that completely on its own? > > Anyway, the question is what you think about doing such a thing in the > driver, if it fixes issues even if it's probably not really correct. > > johannes So, no response from Kalle on this for some time. I thought I'd ping one more time before I make an effort to create another out-of-tree patch. Johannes, if you are OK with a new flag in mac80211, then I can patch ath10k-ct driver regardless of whether other drivers use it. Thanks, Ben
Johannes Berg <johannes@sipsolutions.net> writes: > On Thu, 2020-07-30 at 07:52 -0700, Ben Greear wrote: > >> > Consider >> > >> > add interface wlan0 >> > add interface wlan1 >> > iterate active interfaces -> wlan0 wlan1 >> > add interface wlan2 >> > iterate active interfaces -> wlan0 wlan1 wlan2 >> > >> > If you apply this scenario to a restart, which ought to be functionally >> > equivalent to the normal startup, just compressed in time, you're >> > basically saying that today you get >> > >> > add interface wlan0 >> > add interface wlan1 >> > iterate active interfaces -> wlan0 wlan1 wlan2 << problem here >> > add interface wlan2 >> > iterate active interfaces -> wlan0 wlan1 wlan2 >> > >> > which yeah, totally seems wrong. >> > >> > But fixing that to be >> > >> > add interface wlan0 >> > add interface wlan1 >> > iterate active interfaces -> >> > <nothing> >> > add interface wlan2 >> > iterate active interfaces -> <nothing> >> > (or >> > maybe -> wlan0 wlan1 wlan2 if the reconfig already completed) >> > >> > seems equally wrong? >> >> So, looks like there is a flags option passed to the iterate logic, >> and it is indeed called >> directly from drivers. So, I could just add a new flag value, and | >> it in when calling from ath10k. >> >> I'm not sure it would really solve the second case, but at least in practice, >> that one doesn't seem to be a problem with ath10k, and the first case *was* >> a problem. >> >> If that sounds OK to you, I'll work on the patch as described. > > Right, that'd be the option 2. I described earlier. I can live with that > even if I'd prefer to fix it as per 1. to "make sense". But I guess > there could even be "more legitimate" cases to not want to iterate while > restarting, even if I'm not really sure where that'd make sense? > > I guess Kalle should comment on whether he'd accept that into the > driver. > > Kalle, as you can see above mac80211 appears to be broken wrt. iterating > "active" interfaces during a restart - the iteration considers all > interfaces active that were active before the restart, not just the ones > that were already re-added to the driver. Ben says this causes trouble > in ath10k. > > IMHO the right fix for this would be to fix the iteration to only reach > the ones that have been re-added, like I've said above. OTOH, Ben isn't > really convinced that that's right, and has experience with a patch that > makes mac80211 return *no* interfaces whatsoever in the iteration when > done while in restart. Like I say there, it seems wrong to me. > > But depending on what ath10k actually _does_ with this list, perhaps > it's not an issue. Perhaps it's just transient state that it derives > from it, so if it does it again after the reconfig is completed, it > would in fact get all the information it needed. > > I'm pretty sure this would break iwlwifi, so one option (less preferred) > would be to add a flag to say "skip iteration in reconfig". To me it sounds fine to have such flag in ath10k. I just want the ath10k patch test tested with upstream ath10k and firmware because Ben's driver and firmware might behave differently. > actually does the driver know it's in reconfig? Perhaps it could even do > that completely on its own? We do have ar->state which tracks the reconfig process. For example, in ath10k_reconfig_complete() we have this check: /* If device failed to restart it will be in a different state, e.g. * ATH10K_STATE_WEDGED */ if (ar->state == ATH10K_STATE_RESTARTED) { ath10k_info(ar, "device successfully recovered\n"); ar->state = ATH10K_STATE_ON; ieee80211_wake_queues(ar->hw); }
diff --git a/net/mac80211/util.c b/net/mac80211/util.c index 5db2cd0..186a696 100644 --- a/net/mac80211/util.c +++ b/net/mac80211/util.c @@ -831,7 +831,7 @@ static void __iterate_interfaces(struct ieee80211_local *local, break; } if (!(iter_flags & IEEE80211_IFACE_ITER_RESUME_ALL) && - active_only && !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)) + (active_only && (local->in_reconfig || !(sdata->flags & IEEE80211_SDATA_IN_DRIVER)))) continue; if (ieee80211_sdata_running(sdata) || !active_only) iterator(data, sdata->vif.addr,