mbox series

[RFC,0/4] Allow live MAC address change

Message ID 20190904191155.28056-1-prestwoj@gmail.com (mailing list archive)
Headers show
Series Allow live MAC address change | expand

Message

James Prestwood Sept. 4, 2019, 7:11 p.m. UTC
This set enables userspace to change the devices MAC address via RTNL
or CMD_CONNECT/CMD_AUTHENTICATE without bringing the device down then
back up. This has several benefits:

 - Simplifies user space logic. No longer requires tracking multiple
   RTNL/netlink message transactions (down, change mac, up, connect).
 - Drastically speeds up connection times when changing MAC on a
   per-connect basis.
 - Eliminates potential race conditions in userspace. At any time
   during the down/change/up sequence the device could be hot
   unplugged or something could go wrong in the kernel leaving
   userspace in an unknown state.

A new extended feature was added,
NL80211_EXT_FEATURE_LIVE_ADDRESS_CHANGE which is set if the device
supports this feature. This feature is enabled by default for mac80211
based drivers.

A new NL80211 attribute was added, NL80211_ATTR_MAC_TO_CHANGE. This
attribute can be used with CMD_CONNECT/CMD_AUTHENTICATE to change the
MAC address automatically without having to go through RTNL.

I have taken some timings for all 3 ways of changing the MAC. Powered
change via RTNL, non powered via RTNL, and changing through
CMD_CONNECT. All times were taken in microseconds and tested on an
Intel 7260 PCI wireless adapter:

Powered via RTNL:

Average: 294508.9
Min: 284523
Max: 300345

==================================
Non-Powered via RTNL:

Average: 14824.7
Min: 11037
Max: 17812

Speedup from powered change: 19.87x (average)

==================================
via CMD_CONNECT:

Average: 11848.7
Min: 9748
Max: 17152

Speedup from powered change: 24.86x (average)

==================================

The speed increases between powered and non-powered are quite
impressive. Besides the obvious speedup, the amount of code requred
for userspace to take advantage of this feature is extremely small
(via CMD_CONNECT, essentially a feature check and an extra attribute).
In addition, the changes/complexity to the kernel is minimal.

James Prestwood (4):
  nl80211: Add LIVE_ADDR_CHANGE feature
  {nl|cfg}80211: Support mac change as part of SME Connect
  mac80211: Support LIVE_ADDRESS_CHANGE feature
  {nl,cfg}nl80211: Support mac change for mlme_authenticate

 include/net/cfg80211.h       |  1 +
 include/uapi/linux/nl80211.h | 10 +++++++
 net/mac80211/iface.c         | 51 ++++++++++++++++++++++++++++++++++--
 net/mac80211/main.c          |  1 +
 net/wireless/core.h          |  4 ++-
 net/wireless/mlme.c          |  9 ++++++-
 net/wireless/nl80211.c       | 18 ++++++++++++-
 net/wireless/sme.c           |  9 ++++++-
 net/wireless/util.c          | 11 ++++++++
 9 files changed, 108 insertions(+), 6 deletions(-)

Comments

Johannes Berg Sept. 11, 2019, 9:09 a.m. UTC | #1
Hi James,

TBH, I'm still not really convinced.

> I have taken some timings for all 3 ways of changing the MAC. Powered
> change via RTNL, non powered via RTNL, and changing through
> CMD_CONNECT. All times were taken in microseconds and tested on an
> Intel 7260 PCI wireless adapter:

From where to where did you measure? I mean, clearly you cannot have
counted all the way to the connection?

> Powered via RTNL:
> 
> Average: 294508.9
> Min: 284523
> Max: 300345
> 
> ==================================
> Non-Powered via RTNL:
> 
> Average: 14824.7
> Min: 11037
> Max: 17812
> 
> Speedup from powered change: 19.87x (average)

I'm assuming that this version is the IFF_LIVE_ADDR_CHANGE + setting the
MAC address via RTNL?

If so, yeah, obviously not powering off the firmware will be much faster
than powering it off. That's an easy win really.

> ==================================
> via CMD_CONNECT:
> 
> Average: 11848.7
> Min: 9748
> Max: 17152
> 
> Speedup from powered change: 24.86x (average)

And this really only gives you a gain of 3ms.

That'd be nice, but like I said before, it's not the only thing we/you
should be thinking about.

One fundamental issue I have with this is that you're now combining
together temporary with persistent state changes. After a disconnection
(or connection failure), the interface usually goes back to its previous
state. With this change, you're keeping the MAC address modified though.

Sure, you don't care (because you're probably going use a new random
address later anyway), but these are still things we should consider in
an API.

I'll happily take the subset of the patches that implements the
IFF_LIVE_ADDR_CHANGE in mac80211, but I don't think the 3ms win there
wins over having a well-defined API.

johannes
James Prestwood Sept. 11, 2019, 3:54 p.m. UTC | #2
On Wed, 2019-09-11 at 11:09 +0200, Johannes Berg wrote:
> Hi James,
> 
> TBH, I'm still not really convinced.
> 
> > I have taken some timings for all 3 ways of changing the MAC.
> > Powered
> > change via RTNL, non powered via RTNL, and changing through
> > CMD_CONNECT. All times were taken in microseconds and tested on an
> > Intel 7260 PCI wireless adapter:
> 
> From where to where did you measure? I mean, clearly you cannot have
> counted all the way to the connection?

I could have made this a bit more clear. I initially did measure the
time to a full connection, including EAPoL, but the more I timed the
more chance there was for scheduling delays that really threw off the
results. Not that these results weren't valid, I just would have needed
to time many many more runs to get a decent averaged time. The method
of timings I took just isolated things a bit better.

For the three methods below I measured the time from the connection
initiation (either powering down via RTNL, changing MAC via RTNL, or
sending CMD_CONNECT) until we got a success callback from CMD_CONNECT,
including changing the MAC via RTNL in those cases.

> 
> > Powered via RTNL:
> > 
> > Average: 294508.9
> > Min: 284523
> > Max: 300345
> > 
> > ==================================
> > Non-Powered via RTNL:
> > 
> > Average: 14824.7
> > Min: 11037
> > Max: 17812
> > 
> > Speedup from powered change: 19.87x (average)
> 
> I'm assuming that this version is the IFF_LIVE_ADDR_CHANGE + setting
> the
> MAC address via RTNL?
> 
> If so, yeah, obviously not powering off the firmware will be much
> faster
> than powering it off. That's an easy win really.

Yep exactly.

> 
> > ==================================
> > via CMD_CONNECT:
> > 
> > Average: 11848.7
> > Min: 9748
> > Max: 17152
> > 
> > Speedup from powered change: 24.86x (average)
> 
> And this really only gives you a gain of 3ms.
> 
> That'd be nice, but like I said before, it's not the only thing
> we/you
> should be thinking about.
> 
> One fundamental issue I have with this is that you're now combining
> together temporary with persistent state changes. After a
> disconnection
> (or connection failure), the interface usually goes back to its
> previous
> state. With this change, you're keeping the MAC address modified
> though.
> 
> Sure, you don't care (because you're probably going use a new random
> address later anyway), but these are still things we should consider
> in
> an API.

Yeah, in IWD's case if this feature is supported we would be doing the
MAC change on every connection (unless already changed previously) for
privacy reasons.

Out of curiosity how this behavior is different than the power down +
RTNL MAC change (the current way of doing things)? If you power down
the device, change the MAC, then power up does that MAC get reset after
a disconnection/failure?

> 
> I'll happily take the subset of the patches that implements the
> IFF_LIVE_ADDR_CHANGE in mac80211, but I don't think the 3ms win there
> wins over having a well-defined API.

Sure, that would be great. That is definitely still a improvement to
the existing way of doing things.

Thanks,
James

> 
> johannes
>
Johannes Berg Sept. 11, 2019, 6:25 p.m. UTC | #3
On Wed, 2019-09-11 at 08:54 -0700, James Prestwood wrote:
> 
> I could have made this a bit more clear. I initially did measure the
> time to a full connection, including EAPoL, but the more I timed the
> more chance there was for scheduling delays that really threw off the
> results. Not that these results weren't valid, I just would have needed
> to time many many more runs to get a decent averaged time. The method
> of timings I took just isolated things a bit better.

Sure, makes sense, and I didn't think you were doing that, I was just
wondering what exactly you did measure.

> For the three methods below I measured the time from the connection
> initiation (either powering down via RTNL, changing MAC via RTNL, or
> sending CMD_CONNECT) until we got a success callback from CMD_CONNECT,
> including changing the MAC via RTNL in those cases.

Ah, ok.

> Out of curiosity how this behavior is different than the power down +
> RTNL MAC change (the current way of doing things)? If you power down
> the device, change the MAC, then power up does that MAC get reset after
> a disconnection/failure?

No, of course not? But then you're explicitly issuing a command ("change
the MAC address") that is supposed to affect state indefinitely, vs.
issuing a command ("please connect") that isn't really meant to.

If there was one thing that we learned from wext, IMHO it was that
keeping all the state in the kernel is bad for you, and it's much better
to handle things if the state gets reset when you disconnect etc. In
most places that's what we do now and I think it has served us well, so
I'm very reluctant to mix things that need state in the kernel with
those that don't.

(You might not remember wext, but you'd have to issue a bunch of
commands in the right order, and it would keep all the state inbetween;
if you forgot to clear the BSSID after setting it, it'd be remembered
and you couldn't connect to a new AP unless you reset it, etc.)

Thanks,
johannes
James Prestwood Sept. 11, 2019, 7:20 p.m. UTC | #4
Hi Johannes,

> > Out of curiosity how this behavior is different than the power down
> > +
> > RTNL MAC change (the current way of doing things)? If you power
> > down
> > the device, change the MAC, then power up does that MAC get reset
> > after
> > a disconnection/failure?
> 
> No, of course not? But then you're explicitly issuing a command
> ("change
> the MAC address") that is supposed to affect state indefinitely, vs.
> issuing a command ("please connect") that isn't really meant to.

I see what your saying, but theses kind of state changes are all over
the place in other APIs, and undocumented: One example is
SCAN_FLAG_FLUSH clearing out the scanning state for all other
processes. I'm sure I could find more. If we documented this attribute
and behavior I don't see an issue.

This is also no different than changing the MAC via SET_INTERFACE and
having CMD_CONNECT fail; the MAC still persists but instead we payed an
extra 3ms for the same result.

I know 3ms doesn't seems like a lot but everything counts and from my
testing this is even a further 20% improvement to doing so with RTNL.
Plus the added simplicity to the userspace code/API. We have taken a
lot of time to optimize IWD's connection times, and everything counts.
The connection times are fast already, but when there is room for
improvement we will push for it, especially in situations like this
when the change is quite minimal and does not introduce much
complexity.

> 
> If there was one thing that we learned from wext, IMHO it was that
> keeping all the state in the kernel is bad for you, and it's much
> better
> to handle things if the state gets reset when you disconnect etc. In
> most places that's what we do now and I think it has served us well,
> so
> I'm very reluctant to mix things that need state in the kernel with
> those that don't.

I don't really agree that this change puts any more or less state in
the kernel. Even the current way of doing things userspace still needs
to maintain what it changed the MAC to, here is no different. And
again, documenting this attribute should leave no question about what
is happening.

> 
> (You might not remember wext, but you'd have to issue a bunch of
> commands in the right order, and it would keep all the state
> inbetween;

I was not around for that, but yes that sounds bad. The difference
though is we are not issuing a bunch of state changing commands in a
row to achieve a single goal. We are issuing one single command,
CMD_CONNECT or CMD_AUTHENTICATE.

Thanks,
James

> if you forgot to clear the BSSID after setting it, it'd be remembered
> and you couldn't connect to a new AP unless you reset it, etc.)
> 
> Thanks,
> johannes
>
Kalle Valo Sept. 13, 2019, 10:24 a.m. UTC | #5
James Prestwood <prestwoj@gmail.com> writes:

> I know 3ms doesn't seems like a lot but everything counts and from my
> testing this is even a further 20% improvement to doing so with RTNL.
> Plus the added simplicity to the userspace code/API. We have taken a
> lot of time to optimize IWD's connection times, and everything counts.
> The connection times are fast already, but when there is room for
> improvement we will push for it, especially in situations like this
> when the change is quite minimal and does not introduce much
> complexity.

So what kind of _total_ connection times you get now?
James Prestwood Sept. 13, 2019, 4:17 p.m. UTC | #6
Hi Kalle,

On Fri, 2019-09-13 at 13:24 +0300, Kalle Valo wrote:
> James Prestwood <prestwoj@gmail.com> writes:
> 
> > I know 3ms doesn't seems like a lot but everything counts and from
> > my
> > testing this is even a further 20% improvement to doing so with
> > RTNL.
> > Plus the added simplicity to the userspace code/API. We have taken
> > a
> > lot of time to optimize IWD's connection times, and everything
> > counts.
> > The connection times are fast already, but when there is room for
> > improvement we will push for it, especially in situations like this
> > when the change is quite minimal and does not introduce much
> > complexity.
> 
> So what kind of _total_ connection times you get now?
> 

This really depends. Most of the optimizations I was referencing are
due to scanning optimizations and moving DHCP into IWD itself, but both
of these are kinda irrelevant in this case so I wont consider them.

With this change, looking at the time from CMD_CONNECT until EAPoL/key
setting has finished I calculated 111.4ms on average. This is about a
3.5x speed up from the current method (Power down + RTNL) which I
calculated to be 391.8ms average. Note, this is rough (averaged only 5
runs just now).

So the savings are still significant even if you look at the full
connection times. The difference between doing the MAC change with RTNL
vs CMD_CONNECT are not as drastic, but from my perspective I would say
what's the harm? Your gaining further speed ups with really no added
complexity.

Thanks,
James
Johannes Berg Sept. 13, 2019, 6:49 p.m. UTC | #7
On Wed, 2019-09-11 at 12:20 -0700, James Prestwood wrote:

> I see what your saying, but theses kind of state changes are all over
> the place in other APIs, and undocumented: One example is
> SCAN_FLAG_FLUSH clearing out the scanning state for all other
> processes.

Scanning always changes scan list state?

> I'm sure I could find more. If we documented this attribute
> and behavior I don't see an issue.

But I'm sure you could actually find an example :-)

That doesn't really mean it's the *right* thing to do though, IMHO.

Also, who says that this is the only thing? Next up, somebody wants to
randomize the MTU? Ok, probably not, but you could pick a random other
rtnetlink attribute and have nl80211 set it. Where do we stop?

Thinking this to the extreme - why not add an rtnetlink message
interpreter into this code? ;-)

Sure, none of that is really seriously likely to happen, but I'm really
not convinced we (more or less arbitrarily) need many ways of doing the
same thing in the kernel.

Either way, regardless of that discussion, I think it'd be good if you
could repost the patches for just the "quick win" that we can all agree
on, and then we can get those reviewed and into the tree before we need
to continue this discussion; after all, while we're discussing saving
about 3 milliseconds, you're still wasting around 280 :-)

(and the easy one can be done without affecting the other, just need to
reorder the patches and split them a bit differently)

johannes
Kalle Valo Sept. 17, 2019, 7:46 a.m. UTC | #8
James Prestwood <prestwoj@gmail.com> writes:

> On Fri, 2019-09-13 at 13:24 +0300, Kalle Valo wrote:
>> James Prestwood <prestwoj@gmail.com> writes:
>> 
>> > I know 3ms doesn't seems like a lot but everything counts and from
>> > my
>> > testing this is even a further 20% improvement to doing so with
>> > RTNL.
>> > Plus the added simplicity to the userspace code/API. We have taken
>> > a
>> > lot of time to optimize IWD's connection times, and everything
>> > counts.
>> > The connection times are fast already, but when there is room for
>> > improvement we will push for it, especially in situations like this
>> > when the change is quite minimal and does not introduce much
>> > complexity.
>> 
>> So what kind of _total_ connection times you get now?
>> 
>
> This really depends. Most of the optimizations I was referencing are
> due to scanning optimizations and moving DHCP into IWD itself, but both
> of these are kinda irrelevant in this case so I wont consider them.

For user experience scanning and DHCP are also important, what kind of
numbers you get when those are included? No need to have anything
precise, I would like just to get an understanding where we are
nowadays.

> With this change, looking at the time from CMD_CONNECT until EAPoL/key
> setting has finished I calculated 111.4ms on average. This is about a
> 3.5x speed up from the current method (Power down + RTNL) which I
> calculated to be 391.8ms average. Note, this is rough (averaged only 5
> runs just now).

Ok, thanks.

> So the savings are still significant even if you look at the full
> connection times. The difference between doing the MAC change with RTNL
> vs CMD_CONNECT are not as drastic, but from my perspective I would say
> what's the harm? Your gaining further speed ups with really no added
> complexity.

As you only provided one number it's clear that you are only working
with one driver. But for us it's not that simple, we have to support a
myriad of different types of hardware and there can be complications and
additions later on, even for simple features. Like the dynamic power
save support I submitted to mac80211 over 10 years which was supposed to
be simple, and still we talk almost every year how do we get it out of
mac80211 as it makes maintenance difficult.
Denis Kenzior Sept. 17, 2019, 3:40 p.m. UTC | #9
Hi Kalle,

> For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> 
precise, I would like just to get an understanding where we are> nowadays.

Scanning heavily depends on the RF environment and the hardware.  In our 
testing ath9k takes stupid long to scan for example.

But in a sort of best case scenario, using limited scan and no mac 
change, iwd connects in ~300ms.  People have reported that they have not 
finished opening their laptop screen and they're connected, so at that 
level of latency, every millisecond is important and totally worth 
fighting for.  Randomizing the MAC would penalize our connection times 
by 2X (300 ms at least).  And Android folks have reported the penalty to 
be as high as 3 seconds.  So this needs to be fixed.  And do note that 
this is a feature every modern OS implements by default.

> 
> As you only provided one number it's clear that you are only working
> with one driver. But for us it's not that simple, we have to support a

Please don't jump to conclusions like you seem to be doing here.  James 
gave you one number that is pretty typical.  If you want us to provide 
numbers for other drivers under given conditions, just ask.  We have a 
framework for timing these.

> myriad of different types of hardware and there can be complications and
> additions later on, even for simple features. Like the dynamic power
> save support I submitted to mac80211 over 10 years which was supposed to
> be simple, and still we talk almost every year how do we get it out of
> mac80211 as it makes maintenance difficult.
> 

I'm not sure what point you're trying to make here?

Regards,
-Denis
James Prestwood Sept. 17, 2019, 4:11 p.m. UTC | #10
Hi Kalle,

> 
> As you only provided one number it's clear that you are only working
> with one driver. But for us it's not that simple, we have to support
> a
> myriad of different types of hardware and there can be complications
> and
> additions later on, even for simple features. Like the dynamic power
> save support I submitted to mac80211 over 10 years which was supposed
> to
> be simple, and still we talk almost every year how do we get it out
> of
> mac80211 as it makes maintenance difficult.

As Denis said we are happy to test other drivers, granted I do not have
*that* many cards on hand to test. I understand the kernel needs to
support all kinds of hardware, yes, but what is your concern with this
patch specifically? What kinds of issues do you foresee this causing?
Or is it that you simply don't know any you'd like to find out?

Thanks,
James
Bob Marcan Sept. 17, 2019, 6:44 p.m. UTC | #11
On Tue, 17 Sep 2019 10:40:49 -0500
Denis Kenzior <denkenz@gmail.com> wrote:

> Hi Kalle,
> 
> > For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> precise, I would like just to get an understanding where we are> nowadays.  
> 
> Scanning heavily depends on the RF environment and the hardware.  In our testing ath9k takes stupid long to scan for example.
> 
> But in a sort of best case scenario, using limited scan and no mac change, iwd connects in ~300ms.  People have reported that they have not finished opening their laptop screen and they're connected, so at that level of latency, every millisecond is important and totally worth fighting for.  Randomizing the MAC would penalize our connection times by 2X (300 ms at least).  And Android folks have reported the penalty to be as high as 3 seconds.  So this needs to be fixed.  And do note that this is a feature every modern OS implements by default.

Randomizing the MAC is a stupid decision.
Do you realy expect that this will add something to the security?

B. Marčan
Ben Greear Sept. 17, 2019, 6:47 p.m. UTC | #12
On 9/17/19 11:44 AM, Bob Marcan wrote:
> On Tue, 17 Sep 2019 10:40:49 -0500
> Denis Kenzior <denkenz@gmail.com> wrote:
> 
>> Hi Kalle,
>>
>>> For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> precise, I would like just to get an understanding where we are> nowadays.
>>
>> Scanning heavily depends on the RF environment and the hardware.  In our testing ath9k takes stupid long to scan for example.
>>
>> But in a sort of best case scenario, using limited scan and no mac change, iwd connects in ~300ms.  People have reported that they have not finished opening their laptop screen and they're connected, so at that level of latency, every millisecond is important and totally worth fighting for.  Randomizing the MAC would penalize our connection times by 2X (300 ms at least).  And Android folks have reported the penalty to be as high as 3 seconds.  So this needs to be fixed.  And do note that this is a feature every modern OS implements by default.
> 
> Randomizing the MAC is a stupid decision.
> Do you realy expect that this will add something to the security?

It allows one to be more anonymous.

Thanks,
Ben

> 
> B. Marčan
>
Marcel Holtmann Sept. 17, 2019, 7:05 p.m. UTC | #13
Hi Bob,

>>> For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> precise, I would like just to get an understanding where we are> nowadays.  
>> 
>> Scanning heavily depends on the RF environment and the hardware.  In our testing ath9k takes stupid long to scan for example.
>> 
>> But in a sort of best case scenario, using limited scan and no mac change, iwd connects in ~300ms.  People have reported that they have not finished opening their laptop screen and they're connected, so at that level of latency, every millisecond is important and totally worth fighting for.  Randomizing the MAC would penalize our connection times by 2X (300 ms at least).  And Android folks have reported the penalty to be as high as 3 seconds.  So this needs to be fixed.  And do note that this is a feature every modern OS implements by default.
> 
> Randomizing the MAC is a stupid decision.
> Do you realy expect that this will add something to the security?

the mac randomization is about privacy.

Regards

Marcel
Steve deRosier Sept. 17, 2019, 7:11 p.m. UTC | #14
On Tue, Sep 17, 2019 at 11:44 AM Bob Marcan <bob.marcan@gmail.com> wrote:
>
> On Tue, 17 Sep 2019 10:40:49 -0500
> Denis Kenzior <denkenz@gmail.com> wrote:
>
> > Hi Kalle,
> >
> > > For user experience scanning and DHCP are also important, what kind of > numbers you get when those are included? No need to have anything> precise, I would like just to get an understanding where we are> nowadays.
> >
> > Scanning heavily depends on the RF environment and the hardware.  In our testing ath9k takes stupid long to scan for example.
> >
> > But in a sort of best case scenario, using limited scan and no mac change, iwd connects in ~300ms.  People have reported that they have not finished opening their laptop screen and they're connected, so at that level of latency, every millisecond is important and totally worth fighting for.  Randomizing the MAC would penalize our connection times by 2X (300 ms at least).  And Android folks have reported the penalty to be as high as 3 seconds.  So this needs to be fixed.  And do note that this is a feature every modern OS implements by default.
>
> Randomizing the MAC is a stupid decision.
> Do you realy expect that this will add something to the security?
>

Hi Bob,

Thank you for your post, but this is an opinion and adds nothing to
the technical discussion underway. Let's discuss the patches please.

Thanks,
- Steve