diff mbox

cfg80211: use IDA to allocate wiphy indeces

Message ID 20180621012945.185705-1-briannorris@chromium.org (mailing list archive)
State Rejected
Delegated to: Johannes Berg
Headers show

Commit Message

Brian Norris June 21, 2018, 1:29 a.m. UTC
It's annoying to see the phy index increase arbitrarily, just because a
device got removed and re-probed (e.g., during a device reset, or due to
probe testing). We can use the in-kernel index allocator for this,
instead of just an increasing counter.

Signed-off-by: Brian Norris <briannorris@chromium.org>
---
 net/wireless/core.c | 31 ++++++++++++++-----------------
 1 file changed, 14 insertions(+), 17 deletions(-)

Comments

Johannes Berg June 29, 2018, 7:42 a.m. UTC | #1
Hi Brian,

On Wed, 2018-06-20 at 18:29 -0700, Brian Norris wrote:
> It's annoying to see the phy index increase arbitrarily, just because a
> device got removed and re-probed (e.g., during a device reset, or due to
> probe testing). We can use the in-kernel index allocator for this,
> instead of just an increasing counter.

I can understand that it's somewhat annoying to people, but it was
actually done on purpose to avoid userspace talking to the wrong device.

Imagine you have some userspace process running that has remembered the
wiphy index to use it to talk to nl80211, and now underneath the device
goes away and reappears. This process should understand that situation,
and handle it accordingly, rather than being blind to the reset.

johannes
Brian Norris June 29, 2018, 6:48 p.m. UTC | #2
Hi Johannes,

On Fri, Jun 29, 2018 at 09:42:20AM +0200, Johannes Berg wrote:
> On Wed, 2018-06-20 at 18:29 -0700, Brian Norris wrote:
> > It's annoying to see the phy index increase arbitrarily, just because a
> > device got removed and re-probed (e.g., during a device reset, or due to
> > probe testing). We can use the in-kernel index allocator for this,
> > instead of just an increasing counter.
> 
> I can understand that it's somewhat annoying to people, but it was
> actually done on purpose to avoid userspace talking to the wrong device.

Hmm, interesting. I'm not dead-set on this patch, so if there are good
reasons to reject it, I won't fret.

> Imagine you have some userspace process running that has remembered the
> wiphy index to use it to talk to nl80211, and now underneath the device
> goes away and reappears. This process should understand that situation,
> and handle it accordingly, rather than being blind to the reset.

How is this different from the wlan (netdev) device naming? We allow
'wlan0' to leave and return under the same name. Isn't the right answer
that user space should be listening for udev and/or netlink events?

Brian
Ben Greear June 29, 2018, 7:01 p.m. UTC | #3
On 06/29/2018 11:48 AM, Brian Norris wrote:
> Hi Johannes,
>
> On Fri, Jun 29, 2018 at 09:42:20AM +0200, Johannes Berg wrote:
>> On Wed, 2018-06-20 at 18:29 -0700, Brian Norris wrote:
>>> It's annoying to see the phy index increase arbitrarily, just because a
>>> device got removed and re-probed (e.g., during a device reset, or due to
>>> probe testing). We can use the in-kernel index allocator for this,
>>> instead of just an increasing counter.
>>
>> I can understand that it's somewhat annoying to people, but it was
>> actually done on purpose to avoid userspace talking to the wrong device.
>
> Hmm, interesting. I'm not dead-set on this patch, so if there are good
> reasons to reject it, I won't fret.
>
>> Imagine you have some userspace process running that has remembered the
>> wiphy index to use it to talk to nl80211, and now underneath the device
>> goes away and reappears. This process should understand that situation,
>> and handle it accordingly, rather than being blind to the reset.
>
> How is this different from the wlan (netdev) device naming? We allow
> 'wlan0' to leave and return under the same name. Isn't the right answer
> that user space should be listening for udev and/or netlink events?
>
> Brian
>

For what it is worth, we use udev to rename the phyX to wiphyZ devices based on
their MAC address, and that seems to work OK.

I can't think of any reason why user-space would need the phy index number
to increase as modules are loaded/unloaded though.

Thanks,
Ben
Johannes Berg July 6, 2018, 11:57 a.m. UTC | #4
Hi Brian,

> > Imagine you have some userspace process running that has remembered the
> > wiphy index to use it to talk to nl80211, and now underneath the device
> > goes away and reappears. This process should understand that situation,
> > and handle it accordingly, rather than being blind to the reset.
> 
> How is this different from the wlan (netdev) device naming? We allow
> 'wlan0' to leave and return under the same name. Isn't the right answer
> that user space should be listening for udev and/or netlink events?

Well, first of all - for netdev *naming* these things differ in that
even if you get "wlan0" back, it will in fact have a new interface index
which hasn't been used before. So tools that are not aware of changes
since they don't listen will (hopefully) look up the interface index by
name once, and then keep using that, and then get failures on the
renames.

This doesn't even have to be all that long-running btw, it could be you
enter "iw wlan0 scan" and somewhere between looking up the wlan0
interface index and actually trying to do an operation on it your hw
crashes and the interface goes way. Or similar.

Now, with phy0 there's an additional limitation in that we made it so
you could only use "phyX" for X == phy index. This wasn't there
originally, and technically isn't really needed, but there are
races/issues with this.

In commit 7623225f90526, which really is a revert of Ben's patch that
always used the lowest number for the phy *name*. It looks like after I
had to revert that patch, Ben decided to just name them "wiphyX" with a
low number X in userspace, which is obviously fine.

I think the way to satisfy all of the different concerns around this
would be to track - separately - which phyX *names* (are going to) exist
in the system. As commit 7623225f90526 pointed out:

    This reverts commit 5a254ffe3ffdfa84fe076009bd8e88da412180d2.
    
    The commit failed to take into account that allocated wireless devices
    (wiphys) are not added into the device list upon allocation, but only
    when they are registered. Therefore, it opened up a race between
    allocating and registering a name, so that if two processes allocate and
    register concurrently ("alloc, alloc, register, register" rather than
    "alloc, register, alloc, register") the code will attempt to use the
    same name twice.


The IDA code you wrote avoids this situation because you add the wiphy
index to the IDA data structure on *allocation*, vs. relying just on the
regular rdev list like in Ben's commit.

So, to address my concerns about not reusing the number, I think we
could just decouple the phyX from the wiphy index X (iw has some magic
"phy#x" to use the actual wiphy index if you need to).

Then we can use the IDA to track the allocated *names*, and keep the
actual underlying *index* the same as today - similar to what you
observe with netdevs, e.g. wlan0.

The only complexity is that you have to track this when wiphys are being
renamed, both on renaming away from "phyX" (to free the name index X),
but also on renaming *to* "phyX" to reserve the name index X and fail
the rename if it's already reserved even though the name doesn't show up
on the output of "iw list" yet because it's not registered yet.

johannes
diff mbox

Patch

diff --git a/net/wireless/core.c b/net/wireless/core.c
index c0fd8a85e7f7..80c108c3ca38 100644
--- a/net/wireless/core.c
+++ b/net/wireless/core.c
@@ -8,6 +8,7 @@ 
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/idr.h>
 #include <linux/if.h>
 #include <linux/module.h>
 #include <linux/err.h>
@@ -380,11 +381,11 @@  static void cfg80211_propagate_cac_done_wk(struct work_struct *work)
 
 /* exported functions */
 
+static DEFINE_IDA(wiphy_ida);
+
 struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 			   const char *requested_name)
 {
-	static atomic_t wiphy_counter = ATOMIC_INIT(0);
-
 	struct cfg80211_registered_device *rdev;
 	int alloc_size;
 
@@ -413,18 +414,12 @@  struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 
 	rdev->ops = ops;
 
-	rdev->wiphy_idx = atomic_inc_return(&wiphy_counter);
-
+	rdev->wiphy_idx = ida_simple_get(&wiphy_ida, 0, 0, GFP_KERNEL);
 	if (unlikely(rdev->wiphy_idx < 0)) {
-		/* ugh, wrapped! */
-		atomic_dec(&wiphy_counter);
 		kfree(rdev);
 		return NULL;
 	}
 
-	/* atomic_inc_return makes it start at 1, make it start at 0 */
-	rdev->wiphy_idx--;
-
 	/* give it a proper name */
 	if (requested_name && requested_name[0]) {
 		int rv;
@@ -452,10 +447,8 @@  struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 		 * value, and use a different name if this one exists?
 		 */
 		rv = dev_set_name(&rdev->wiphy.dev, PHY_NAME "%d", rdev->wiphy_idx);
-		if (rv < 0) {
-			kfree(rdev);
-			return NULL;
-		}
+		if (rv < 0)
+			goto err;
 	}
 
 	INIT_LIST_HEAD(&rdev->wiphy.wdev_list);
@@ -497,10 +490,8 @@  struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 				   &rdev->wiphy.dev, RFKILL_TYPE_WLAN,
 				   &rdev->rfkill_ops, rdev);
 
-	if (!rdev->rfkill) {
-		kfree(rdev);
-		return NULL;
-	}
+	if (!rdev->rfkill)
+		goto err;
 
 	INIT_WORK(&rdev->rfkill_sync, cfg80211_rfkill_sync_work);
 	INIT_WORK(&rdev->conn_work, cfg80211_conn_work);
@@ -525,6 +516,11 @@  struct wiphy *wiphy_new_nm(const struct cfg80211_ops *ops, int sizeof_priv,
 	rdev->wiphy.max_sched_scan_plan_interval = U32_MAX;
 
 	return &rdev->wiphy;
+
+err:
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
+	kfree(rdev);
+	return NULL;
 }
 EXPORT_SYMBOL(wiphy_new_nm);
 
@@ -972,6 +968,7 @@  void cfg80211_dev_free(struct cfg80211_registered_device *rdev)
 	}
 	list_for_each_entry_safe(scan, tmp, &rdev->bss_list, list)
 		cfg80211_put_bss(&rdev->wiphy, &scan->pub);
+	ida_simple_remove(&wiphy_ida, rdev->wiphy_idx);
 	kfree(rdev);
 }