Message ID | 20230417185457.1000642-1-prestwoj@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | station: delay roam until fully connected | expand |
Hi James, On 4/17/23 13:54, James Prestwood wrote: > If DHCP/netconfig takes a long time to finish IWD could end up trying > to roam to another BSS. This could happen if the signal strength is > below the set threshold while connecting. Once authenticated to the > AP IWD would get a CQM event and start a roam timer. If DHCP did not > finish before that timer IWD would start a roam. IWD would not handle > this well since a roam doesn't trigger DHCP to start and likely the > network infrustructure would be very confused. Typo here, 'infrustructure'. > > To fix this continue to delay the roam attempt until IWD reaches the > connected state. I'm not convinced this is the right approach. If the signal is bad enough that DHCP is failing, then delaying the roam is probably the worst choice we can make, no? I would think the better way to fix this would be to track netconfig state, and if it never reached a fully configured state prior to roaming, reset & reconfigure upon a successful "roam". Note that since we haven't signaled "connected" just yet, we're really still just in the "connecting" phase. > --- > src/station.c | 16 ++++++++++++++++ > 1 file changed, 16 insertions(+) > Regards, -Denis
Hi Denis, On 4/23/23 4:39 PM, Denis Kenzior wrote: > Hi James, > > On 4/17/23 13:54, James Prestwood wrote: >> If DHCP/netconfig takes a long time to finish IWD could end up trying >> to roam to another BSS. This could happen if the signal strength is >> below the set threshold while connecting. Once authenticated to the >> AP IWD would get a CQM event and start a roam timer. If DHCP did not >> finish before that timer IWD would start a roam. IWD would not handle >> this well since a roam doesn't trigger DHCP to start and likely the >> network infrustructure would be very confused. > > Typo here, 'infrustructure'. > >> >> To fix this continue to delay the roam attempt until IWD reaches the >> connected state. > > I'm not convinced this is the right approach. If the signal is bad > enough that DHCP is failing, then delaying the roam is probably the > worst choice we can make, no? It may not be that the connection quality caused DHCP to take a while, could be the backend just was slow and not responding quickly. > > I would think the better way to fix this would be to track netconfig > state, and if it never reached a fully configured state prior to > roaming, reset & reconfigure upon a successful "roam". Note that since > we haven't signaled "connected" just yet, we're really still just in the > "connecting" phase. The question is if the AP infrastructure could handle this. For example the worst case being if DHCP started and we roamed right in the middle... That's why I thinking the safest option was to complete DHCP, then roam. I never actually saw this behavior happening, just noticed that IWD would get very confused if it did happen. Thanks, James > >> --- >> src/station.c | 16 ++++++++++++++++ >> 1 file changed, 16 insertions(+) >> > > Regards, > -Denis
Hi James, >> I'm not convinced this is the right approach. If the signal is bad enough >> that DHCP is failing, then delaying the roam is probably the worst choice we >> can make, no? > > It may not be that the connection quality caused DHCP to take a while, could be > the backend just was slow and not responding quickly. Possible. But I don't think this is a good reason to remain on a bad AP. Besides, what if iwd is not running DHCP and it is being done by some external app? In that case iwd will happily roam off anyway and the external app will restart dhcp. So remaining on the bad AP is asymmetrical to the EnableNetworkConfiguration=False situation. > >> >> I would think the better way to fix this would be to track netconfig state, >> and if it never reached a fully configured state prior to roaming, reset & >> reconfigure upon a successful "roam". Note that since we haven't signaled >> "connected" just yet, we're really still just in the "connecting" phase. > > The question is if the AP infrastructure could handle this. For example the > worst case being if DHCP started and we roamed right in the middle... That's why > I thinking the safest option was to complete DHCP, then roam. I never actually > saw this behavior happening, just noticed that IWD would get very confused if it > did happen. Well, see above. Infrastructure has to handle this possibility anyway. But yes, this is certainly a corner case we are not handling right now. Regards, -Denis
diff --git a/src/station.c b/src/station.c index dfc23e24..18c37f9c 100644 --- a/src/station.c +++ b/src/station.c @@ -2786,6 +2786,22 @@ static void station_roam_trigger_cb(struct l_timeout *timeout, void *user_data) if (station_cannot_roam(station)) return; + /* + * This is unlikely, but can still happen when network configuration is + * enabled and DHCP is slow. If IWD connects to a BSS under the low RSSI + * threshold it will immediately get a CQM event once connected (as far + * as the kernel is concerned, NOT IWD's connected state). This starts + * a 5 second timer and if DHCP has not completed by then IWD will + * roam. Since this would likely confused the APs and IWD its best to + * let DHCP complete i.e. wait until we are in a connected state. + */ + if (station->state != STATION_STATE_CONNECTED) { + l_debug("Tried to roam in unconnected state, will try " + "again in 5 seconds"); + station_roam_timeout_rearm(station, 5); + return; + } + station_start_roam(station); }