Message ID | 20221117152155.2310412-1-alvin@pqrs.dk (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | scan: make scan requests fail if trigger returns -EBUSY | expand |
Context | Check | Description |
---|---|---|
tedd_an/pre-ci_am | success | Success |
prestwoj/iwd-alpine-ci-fetch | success | Fetch PR |
prestwoj/iwd-ci-gitlint | success | GitLint |
prestwoj/iwd-ci-fetch | success | Fetch PR |
prestwoj/iwd-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-ci-incremental_build | success | Incremental build not run PASS |
prestwoj/iwd-alpine-ci-makedistcheck | success | Make Distcheck |
prestwoj/iwd-alpine-ci-incremental_build | success | Incremental build not run PASS |
prestwoj/iwd-ci-build | success | Build - Configure |
prestwoj/iwd-alpine-ci-build | success | Build - Configure |
prestwoj/iwd-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-ci-clang | success | clang PASS |
prestwoj/iwd-ci-makecheck | success | Make Check |
prestwoj/iwd-alpine-ci-makecheckvalgrind | success | Make Check w/Valgrind |
prestwoj/iwd-alpine-ci-makecheck | success | Make Check |
prestwoj/iwd-ci-testrunner | success | test-runner PASS |
Hi Alvin, On 11/17/22 09:21, Alvin Šipraga wrote: > From: Alvin Šipraga <alsi@bang-olufsen.dk> > > If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes > that a scan is ongoing on the underlying wdev and will retry the same > command when that scan is complete. It gets notified of that completion > via the scan_notify() function, and kicks the scan logic to try again. > > However, if there is another wdev on the same wiphy and that wdev has a > scan request in flight, the kernel will also return -EBUSY. In other > words, only one scan request per wiphy is permitted. This should really not be possible since all scans go through the wiphy work queue? > > As an example, the brcmfmac driver can create an AP interface on the > same wiphy as the default station interface, and scans can be triggered > on that AP interface. Or are you setting up a second interface out of iwd's control? > > If -EBUSY is returned because another wdev is scanning, then iwd won't > know when it can retry the original trigger request because the relevant > netlink event will arrive on a different wdev. Indeed, if no scan > context exists for that other wdev, then scan_notify will return early > and the scan logic will stall indefinitely. So I think the way to handle this is to make scan aware of all interfaces, even foreign ones, and consider restarting a scan request whenever a scan finished notification on a given phy is received. > > From a user perspective, this leads to a situation where iwd reports > over DBus that it is not scanning, but it never queues up another > periodic scan. Or if a scan was explicitly requested over DBus and the > trigger for that scan ended in an -EBUSY for the aforementioned reason, > the DBus method call stall indefinitely. > > Since scan contexts are handled on a per-wdev basis, and since multiple > wdevs may in principle share the same wiphy, solve the above problem by > instead having the scan module simply propagate the -EBUSY error. In > practice this will mean that scans may fail more often, in which case at > least one default error message will be logged each time: I doubt this is a good idea unless you want to visit every call site that invokes the scan logic and make it handle -EBUSY gracefully, since it is a bit of a special case. Regards, -Denis
Hi Denis, Thanks for your quick reply. On Thu, Nov 17, 2022 at 10:29:22AM -0600, Denis Kenzior wrote: > Hi Alvin, > > On 11/17/22 09:21, Alvin Šipraga wrote: > > From: Alvin Šipraga <alsi@bang-olufsen.dk> > > > > If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes > > that a scan is ongoing on the underlying wdev and will retry the same > > command when that scan is complete. It gets notified of that completion > > via the scan_notify() function, and kicks the scan logic to try again. > > > > However, if there is another wdev on the same wiphy and that wdev has a > > scan request in flight, the kernel will also return -EBUSY. In other > > words, only one scan request per wiphy is permitted. > > This should really not be possible since all scans go through the wiphy work queue? > > > > > As an example, the brcmfmac driver can create an AP interface on the > > same wiphy as the default station interface, and scans can be triggered > > on that AP interface. > > Or are you setting up a second interface out of iwd's control? This. > > > > > If -EBUSY is returned because another wdev is scanning, then iwd won't > > know when it can retry the original trigger request because the relevant > > netlink event will arrive on a different wdev. Indeed, if no scan > > context exists for that other wdev, then scan_notify will return early > > and the scan logic will stall indefinitely. > > So I think the way to handle this is to make scan aware of all interfaces, > even foreign ones, and consider restarting a scan request whenever a scan > finished notification on a given phy is received. Yes, I also considered doing something like that: - have scan_notify take notice of all NEW_SCAN_RESULTS/SCAN_ABORTED events; - if a scan context is found for the wdev associated with the event, preserve the old logic; - if a scan context isn't found, then search the same queue with an alternate match function like this: static bool scan_context_match_wiphy(const void *a, const void *b) { const struct scan_context *sc = a; const uint64_t *wiphy_id = b; /* Find a scan context on the same wiphy, as it might be held up */ if (sc->wiphy_id != *wiphy_id) return false; /* It should have a request pending... */ if (!l_queue_peek_head(sc->requests)) return false; /* This means the next request is worth retrying */ return true; } - then if such a scan context is found, we can reset the scanning state to SCAN_STATE_NOT_RUNNING and call start_next_scan_request() If that sounds semi-reasonable I can draft a patch and we can discuss it further. One theoretical shortcoming of the above is that it is not "fair", i.e. if you have multiple wdevs on the same wiphy then their scan request queues will be fully drained before the next wdev gets a chance to scan. But I guess it's OK. > > > > > From a user perspective, this leads to a situation where iwd reports > > over DBus that it is not scanning, but it never queues up another > > periodic scan. Or if a scan was explicitly requested over DBus and the > > trigger for that scan ended in an -EBUSY for the aforementioned reason, > > the DBus method call stall indefinitely. > > > > Since scan contexts are handled on a per-wdev basis, and since multiple > > wdevs may in principle share the same wiphy, solve the above problem by > > instead having the scan module simply propagate the -EBUSY error. In > > practice this will mean that scans may fail more often, in which case at > > least one default error message will be logged each time: > > I doubt this is a good idea unless you want to visit every call site that > invokes the scan logic and make it handle -EBUSY gracefully, since it is a > bit of a special case. OK, let me try the other way then :) Kind regards, Alvin
Hi Alvin, >> Or are you setting up a second interface out of iwd's control? > > This. > Ok, makes sense. >> >>> >>> If -EBUSY is returned because another wdev is scanning, then iwd won't >>> know when it can retry the original trigger request because the relevant >>> netlink event will arrive on a different wdev. Indeed, if no scan >>> context exists for that other wdev, then scan_notify will return early >>> and the scan logic will stall indefinitely. >> >> So I think the way to handle this is to make scan aware of all interfaces, >> even foreign ones, and consider restarting a scan request whenever a scan >> finished notification on a given phy is received. > > Yes, I also considered doing something like that: > > - have scan_notify take notice of all NEW_SCAN_RESULTS/SCAN_ABORTED > events; > - if a scan context is found for the wdev associated with the event, > preserve the old logic; I'm not aware of any devices that can invoke multiple scans. It is always limited to 1 / phy AFAIK, although that might change in the future. So really a scan finished / aborted notification should be treated as per phy and not per-wdev. > - if a scan context isn't found, then search the same queue with an > alternate match function like this: > > static bool scan_context_match_wiphy(const void *a, const void *b) > { > const struct scan_context *sc = a; > const uint64_t *wiphy_id = b; > > /* Find a scan context on the same wiphy, as it might be held up */ > if (sc->wiphy_id != *wiphy_id) > return false; > > /* It should have a request pending... */ > if (!l_queue_peek_head(sc->requests)) > return false; > > /* This means the next request is worth retrying */ > return true; > } Yep, something like this. The wiphy work queue would have a 'running' request and if that request is a scan, then that's the one that should be restarted. > > - then if such a scan context is found, we can reset the scanning state > to SCAN_STATE_NOT_RUNNING and call start_next_scan_request() > > If that sounds semi-reasonable I can draft a patch and we can discuss it > further. One theoretical shortcoming of the above is that it is not > "fair", i.e. if you have multiple wdevs on the same wiphy then their > scan request queues will be fully drained before the next wdev gets a > chance to scan. But I guess it's OK. Not sure what you mean? Internally iwd operates on 'logical' scan requests. These might be broken up into multiple actual scan requests to the driver, but each logical scan request is wrapped by a wiphy work request. So multiple logical scan requests from multiple wdevs would be interleaved / scheduled appropriately. But yes, I think what you propose here is the right approach. Regards, -Denis
diff --git a/src/scan.c b/src/scan.c index 5548914a12de..b487c342129d 100644 --- a/src/scan.c +++ b/src/scan.c @@ -229,12 +229,6 @@ static void scan_request_triggered(struct l_genl_msg *msg, void *userdata) err = l_genl_msg_get_error(msg); if (err < 0) { - /* Scan in progress, assume another scan is running */ - if (err == -EBUSY) { - sc->state = SCAN_STATE_PASSIVE; - return; - } - scan_request_failed(sc, sr, err); l_error("Received error during CMD_TRIGGER_SCAN: %s (%d)", @@ -2094,7 +2088,6 @@ static void scan_notify(struct l_genl_msg *msg, void *user_data) { struct scan_freq_set *freqs; bool send_next = false; - bool retry = false; bool get_results = false; sc->state = SCAN_STATE_NOT_RUNNING; @@ -2138,24 +2131,17 @@ static void scan_notify(struct l_genl_msg *msg, void *user_data) /* * Drop the ongoing scan if an external scan flushed - * our results. Otherwise, try to retry the trigger - * request if it failed with an -EBUSY. + * our results. */ if (sr && sr->started && scan_parse_flush_flag_from_msg(msg)) scan_finished(sc, -EAGAIN, NULL, NULL, sr); - else - retry = true; sr = NULL; } - /* - * Send the next command of an ongoing request, or continue - * with a previously busy scan attempt due to an external - * scan. - */ - if (send_next || retry) { + /* Send the next command of an ongoing request */ + if (send_next) { struct scan_request *next = l_queue_peek_head( sc->requests); @@ -2220,7 +2206,7 @@ static void scan_notify(struct l_genl_msg *msg, void *user_data) * we may be able to now queue our own scan although * the abort could also have been triggered by the * hardware or the driver because of another activity - * starting in which case we should just get an EBUSY. + * starting in which case we should get an EBUSY. */ start_next_scan_request(&sr->work); } diff --git a/src/station.c b/src/station.c index eab16eff5afa..f12ca22a8648 100644 --- a/src/station.c +++ b/src/station.c @@ -3845,7 +3845,7 @@ static void station_dbus_scan_triggered(int err, void *user_data) dbus_pending_reply(&station->scan_pending, reply); } - station_dbus_scan_done(station, true); + station_dbus_scan_done(station, false); return; }
From: Alvin Šipraga <alsi@bang-olufsen.dk> If a CMD_TRIGGER_SCAN request fails with -EBUSY, iwd currently assumes that a scan is ongoing on the underlying wdev and will retry the same command when that scan is complete. It gets notified of that completion via the scan_notify() function, and kicks the scan logic to try again. However, if there is another wdev on the same wiphy and that wdev has a scan request in flight, the kernel will also return -EBUSY. In other words, only one scan request per wiphy is permitted. As an example, the brcmfmac driver can create an AP interface on the same wiphy as the default station interface, and scans can be triggered on that AP interface. If -EBUSY is returned because another wdev is scanning, then iwd won't know when it can retry the original trigger request because the relevant netlink event will arrive on a different wdev. Indeed, if no scan context exists for that other wdev, then scan_notify will return early and the scan logic will stall indefinitely. From a user perspective, this leads to a situation where iwd reports over DBus that it is not scanning, but it never queues up another periodic scan. Or if a scan was explicitly requested over DBus and the trigger for that scan ended in an -EBUSY for the aforementioned reason, the DBus method call stall indefinitely. Since scan contexts are handled on a per-wdev basis, and since multiple wdevs may in principle share the same wiphy, solve the above problem by instead having the scan module simply propagate the -EBUSY error. In practice this will mean that scans may fail more often, in which case at least one default error message will be logged each time: iwd[1451]: Received error during CMD_TRIGGER_SCAN: Device or resource busy (16) It can be debated whether or not this message is needed: maybe the upper-layer may wish to implement its own retry logic instead. In the case of a periodic scan for example, a failure is met with ambivalence: no additional error is logged, and a new scan is simply scheduled according to the periodic scan interval. In other cases, a scan might be part of a DBus request to connect to a hidden network. Here, the error will be different: Failure to connect because the network wasn't found after scanning: $ iwctl station wlan0 connect-hidden foobar Object not found Failure to connect because another scan was active on the same wiphy: $ iw dev uap0 scan >/dev/null & $ iwctl station wlan0 connect-hidden foobar Operation failed This is the trade-off made by this change in order not to complicate the scan context with lookup-by-wiphy and an internal retry mechanism, while still providing robustness and not stalling the scan logic of the daemon. Additionally, in the event of a failed scan trigger via the DBus scan method, do not automatically jumpstart the autoconnect logic, as this would not make sense. --- src/scan.c | 22 ++++------------------ src/station.c | 2 +- 2 files changed, 5 insertions(+), 19 deletions(-)