diff mbox

Question on setting key right after the EAPOL 4/4 is sent.

Message ID 4982156c-5325-8021-dcd3-f13e02c63c72@candelatech.com (mailing list archive)
State RFC
Delegated to: Kalle Valo
Headers show

Commit Message

Ben Greear June 8, 2017, 11:17 p.m. UTC
I believe I found a problem that may be larger than my little sandbox.

The problem I see is that sometimes (and quite often when I am using lots
of vdevs and thus the NIC is busy), the keys are set before the EAPOL 4/4
hits the air.  When the key is set, the NIC will no longer transmit the
frame because of key-length issues in the tx-descriptor (ath10k wave-2
in this case).

If I add a sleep before setting the key, then it works much more
reliably.

I suspect that there is a fundamental race between the EAPOL packet-tx
logic and the key-set logic, but supplicant appears to act as though
they are natually synchronized.

So, any suggestions on how to do this right?

Thanks,
Ben

greearb@v-f24-64 hostap]$ git diff

Comments

Denis Kenzior June 8, 2017, 11:36 p.m. UTC | #1
Hi Ben,

> The problem I see is that sometimes (and quite often when I am using lots
> of vdevs and thus the NIC is busy), the keys are set before the EAPOL 4/4
> hits the air.  When the key is set, the NIC will no longer transmit the
> frame because of key-length issues in the tx-descriptor (ath10k wave-2
> in this case).

We have encountered something similar.  In our case we were seeing PAE 
packets (e.g. 4WayHandshake packet 1 of 4) before seeing the connect 
events on nl80211.

> I suspect that there is a fundamental race between the EAPOL packet-tx
> logic and the key-set logic, but supplicant appears to act as though
> they are natually synchronized.

Fundamentally there is a race between the genl/nl80211 socket to the 
kernel and the PAE socket that handles the authentication aspects.  I 
think the only way to fix this is to make sure that PAE flows over the 
genl/nl80211 socket to preserve the proper order of events.  However 
there are lots of dragons in the kernel side of this and we haven't been 
brave enough to venture into the depths yet :)

Regards,
-Denis
Ben Greear June 8, 2017, 11:43 p.m. UTC | #2
On 06/08/2017 04:36 PM, Denis Kenzior wrote:
> Hi Ben,
>
>> The problem I see is that sometimes (and quite often when I am using lots
>> of vdevs and thus the NIC is busy), the keys are set before the EAPOL 4/4
>> hits the air.  When the key is set, the NIC will no longer transmit the
>> frame because of key-length issues in the tx-descriptor (ath10k wave-2
>> in this case).
>
> We have encountered something similar.  In our case we were seeing PAE packets (e.g. 4WayHandshake packet 1 of 4) before seeing the connect events on nl80211.
>
>> I suspect that there is a fundamental race between the EAPOL packet-tx
>> logic and the key-set logic, but supplicant appears to act as though
>> they are natually synchronized.
>
> Fundamentally there is a race between the genl/nl80211 socket to the kernel and the PAE socket that handles the authentication aspects.  I think the only way to
> fix this is to make sure that PAE flows over the genl/nl80211 socket to preserve the proper order of events.  However there are lots of dragons in the kernel
> side of this and we haven't been brave enough to venture into the depths yet :)

I think that would just push the problem lower.  Probably a real fix would be to somehow propagate
the tx-status for the specific packet back to the supplicant and only then it would know that the
key could be set.

Thanks,
Ben
Denis Kenzior June 9, 2017, 12:07 a.m. UTC | #3
Hi Ben,

On 06/08/2017 06:43 PM, Ben Greear wrote:
> On 06/08/2017 04:36 PM, Denis Kenzior wrote:
>> Hi Ben,
>>
>>> The problem I see is that sometimes (and quite often when I am using
>>> lots
>>> of vdevs and thus the NIC is busy), the keys are set before the EAPOL
>>> 4/4
>>> hits the air.  When the key is set, the NIC will no longer transmit the
>>> frame because of key-length issues in the tx-descriptor (ath10k wave-2
>>> in this case).
>>
>> We have encountered something similar.  In our case we were seeing PAE
>> packets (e.g. 4WayHandshake packet 1 of 4) before seeing the connect
>> events on nl80211.
>>
>>> I suspect that there is a fundamental race between the EAPOL packet-tx
>>> logic and the key-set logic, but supplicant appears to act as though
>>> they are natually synchronized.
>>
>> Fundamentally there is a race between the genl/nl80211 socket to the
>> kernel and the PAE socket that handles the authentication aspects.  I
>> think the only way to
>> fix this is to make sure that PAE flows over the genl/nl80211 socket
>> to preserve the proper order of events.  However there are lots of
>> dragons in the kernel
>> side of this and we haven't been brave enough to venture into the
>> depths yet :)
>
> I think that would just push the problem lower.  Probably a real fix
> would be to somehow propagate
> the tx-status for the specific packet back to the supplicant and only
> then it would know that the
> key could be set.
>

Having userspace track individual packets in the kernel sounds  wrong to 
me.  This also won't help with the packets being received out-of-order. 
It would be nice if both the RX and TX ordering was preserved.  Hence my 
thinking about running PAE over NL80211.  It would then be up to the 
kernel / drivers to guarantee that the various packets are ordered 
appropriately.

Regardless of the solution, we're happy to help in order to get this fixed.

Regards,
-Denis
Johannes Berg June 9, 2017, 7:28 a.m. UTC | #4
On Thu, 2017-06-08 at 19:07 -0500, Denis Kenzior wrote:

> > > Fundamentally there is a race between the genl/nl80211 socket to
> > > the
> > > kernel and the PAE socket that handles the authentication
> > > aspects.  I
> > > think the only way to
> > > fix this is to make sure that PAE flows over the genl/nl80211
> > > socket
> > > to preserve the proper order of events.  

Correct.

> > > However there are lots of
> > > dragons in the kernel
> > > side of this and we haven't been brave enough to venture into the
> > > depths yet :)

We've actually discussed doing precisely this, for - among other things
- this reason. Just nobody stepped up yet to propose the necessary APIs
and do the remaining work to use it etc.

> > I think that would just push the problem lower.  Probably a real
> > fix
> > would be to somehow propagate
> > the tx-status for the specific packet back to the supplicant and
> > only
> > then it would know that the
> > key could be set.

That's actually possible today, with the wifi-ack sockopt. It's not
really a full solution though I think, there are other issues to solve.
We also discussed this at the last workshop, IIRC.

> Having userspace track individual packets in the kernel sounds  wrong
> to  me.  This also won't help with the packets being received out-of-
> order.  It would be nice if both the RX and TX ordering was
> preserved.  Hence my thinking about running PAE over NL80211.  It
> would then be up to the kernel / drivers to guarantee that the
> various packets are ordered appropriately.

That's actually not possible, since ordering set_key operations vs.
transmitted packets isn't something that's easily done by drivers.

However, the solution is far simpler! Once you have nl80211 PAE
transport, you can easily even set the key before transmitting the
packet and simply indicate that this particular packet should _not_ be
encrypted regardless of key presence.

johannes
Denis Kenzior June 9, 2017, 1:10 p.m. UTC | #5
Hi Johannes,

>
> We've actually discussed doing precisely this, for - among other things
> - this reason. Just nobody stepped up yet to propose the necessary APIs
> and do the remaining work to use it etc.
>

Do you have any thoughts on what the operations should look like or do 
you want me to take a stab in the dark at this?

>> Having userspace track individual packets in the kernel sounds  wrong
>> to  me.  This also won't help with the packets being received out-of-
>> order.  It would be nice if both the RX and TX ordering was
>> preserved.  Hence my thinking about running PAE over NL80211.  It
>> would then be up to the kernel / drivers to guarantee that the
>> various packets are ordered appropriately.
>
> That's actually not possible, since ordering set_key operations vs.
> transmitted packets isn't something that's easily done by drivers.

Fair enough, but at least the kernel can do its best to make sure that 
such races do not manifest themselves out into userspace.  E.g. making 
sure that PAE events arrive after the connect events, etc.

>
> However, the solution is far simpler! Once you have nl80211 PAE
> transport, you can easily even set the key before transmitting the
> packet and simply indicate that this particular packet should _not_ be
> encrypted regardless of key presence.
>

Makes sense.  Should PAE packets always be sent unencrypted?  Or should 
userspace be notified whether PAE was received unencrypted and send a 
response with the same flag?

Also, while we're on this subject.  Should the kernel auto-manage the 
LINKMODE and OPERSTATE flags?  It would seem that it already has the 
information to do so, and having userspace manage this just introduces 
another source of latency / possibility of race conditions, etc.

Regards,
-Denis
Ben Greear June 9, 2017, 1:46 p.m. UTC | #6
On 06/09/2017 12:28 AM, Johannes Berg wrote:
> On Thu, 2017-06-08 at 19:07 -0500, Denis Kenzior wrote:
>
>>>> Fundamentally there is a race between the genl/nl80211 socket to
>>>> the
>>>> kernel and the PAE socket that handles the authentication
>>>> aspects.  I
>>>> think the only way to
>>>> fix this is to make sure that PAE flows over the genl/nl80211
>>>> socket
>>>> to preserve the proper order of events.
>
> Correct.
>
>>>> However there are lots of
>>>> dragons in the kernel
>>>> side of this and we haven't been brave enough to venture into the
>>>> depths yet :)
>
> We've actually discussed doing precisely this, for - among other things
> - this reason. Just nobody stepped up yet to propose the necessary APIs
> and do the remaining work to use it etc.
>
>>> I think that would just push the problem lower.  Probably a real
>>> fix
>>> would be to somehow propagate
>>> the tx-status for the specific packet back to the supplicant and
>>> only
>>> then it would know that the
>>> key could be set.
>
> That's actually possible today, with the wifi-ack sockopt. It's not
> really a full solution though I think, there are other issues to solve.
> We also discussed this at the last workshop, IIRC.
>
>> Having userspace track individual packets in the kernel sounds  wrong
>> to  me.  This also won't help with the packets being received out-of-
>> order.  It would be nice if both the RX and TX ordering was
>> preserved.  Hence my thinking about running PAE over NL80211.  It
>> would then be up to the kernel / drivers to guarantee that the
>> various packets are ordered appropriately.
>
> That's actually not possible, since ordering set_key operations vs.
> transmitted packets isn't something that's easily done by drivers.
>
> However, the solution is far simpler! Once you have nl80211 PAE
> transport, you can easily even set the key before transmitting the
> packet and simply indicate that this particular packet should _not_ be
> encrypted regardless of key presence.

My ath10k firmware cannot deal with a case like this:

pkt is enqueued before key is set
key is set
pkt is transmitted (incorrectly)

This is because of how the tid's header-length variables are set up and modified
when the keys are set, and I don't see any good way to fix this.

Stock ath10k firmware goes to great lengths to parse EAPOL frames and try to work around
it in that manner, but that breaks .11r (or used to, I haven't tried stock firmware lately)
and adds more complexity to the code.

 From a patch someone sent to hostapd list last night, it seems we could get the tx-status for
the EAPOL 4/4, and in that case, we *know* the pkt has been transmitted, so we can then
set the key safely it would seem?

Thanks,
Ben
Johannes Berg June 9, 2017, 7:56 p.m. UTC | #7
Hi Denis,

Btw, now I finally remembered what this also fixes - the whole bridging
mess :-) When we have this, we no longer need hostapd/wpas to be aware
of bridging and capturing frames beneath the bridge - something that's
actually a regression in Linux at some point.

> Do you have any thoughts on what the operations should look like or
> do you want me to take a stab in the dark at this?

Haven't really thought about it that much. I guess it should be similar
to mgmt_tx, quite possibly even just using that same operation?
Similarly, nl80211_register_mgmt() would have to be extended.

We already have NL80211_ATTR_CONTROL_PORT_ETHERTYPE, so perhaps this
should be used for some sort of filtering, i.e. perhaps
nl80211_register_mgmt() could be used with NL80211_ATTR_CONTROL_PORT or
something... Not really sure what the best plan is. Perhaps Jouni has
some other ideas.

It does seem like this shouldn't allow any other data frames to be
intercepted, since the control_port_protocol already requires special
handling in mac80211, but we don't really want to add even more than
that, i.e. I really don't want to see arbitrary ethertype interception
at this level.

> > That's actually not possible, since ordering set_key operations vs.
> > transmitted packets isn't something that's easily done by drivers.
> 
> Fair enough, but at least the kernel can do its best to make sure
> that such races do not manifest themselves out into userspace.  E.g.
> making sure that PAE events arrive after the connect events, etc.

Yes, having the events across a common path would actually solve that
particular race - not much more though, afaict.

> Makes sense.  Should PAE packets always be sent unencrypted?  Or
> should userspace be notified whether PAE was received unencrypted and
> send a response with the same flag?

No, they shouldn't always be sent unencrypted. They're specified to be
encrypted, e.g. for GTK rekeying that's required (though some non-IEEE
protocols require unencrypted, NL80211_ATTR_CONTROL_PORT_NO_ENCRYPT.)

Since we don't use the value of this attribute for anything but the TX
path, we can go two ways here:

 1) with the new EAPOL-over-nl80211, require userspace to set the
    unencrypted flag for every frame where it's needed, i.e. even for
    rekeying frames on such non-IEEE protocols

 2) if "do not encrypt" isn't specified, use control_port_no_encrypt as
    the default value

I think I'd prefer (1), but only slightly, so perhaps it makes sense to
check what's better in the resulting code.

> Also, while we're on this subject.  Should the kernel auto-manage
> the LINKMODE and OPERSTATE flags?  It would seem that it already has
> the information to do so, and having userspace manage this just
> introduces another source of latency / possibility of race
> conditions, etc.

I'm not convinced that it does have that information in all cases, and
I don't see how that causes race conditions or much latency, since for
client mode userspace would probably just set that together with the
NL80211_STA_FLAG_AUTHORIZED flag?

Anyway - that really should be a separate discussion and project, and
I'd prefer not to take this thread away from the original subject too
much.

johannes
Johannes Berg June 9, 2017, 8:01 p.m. UTC | #8
On Fri, 2017-06-09 at 06:46 -0700, Ben Greear wrote:

> > However, the solution is far simpler! Once you have nl80211 PAE
> > transport, you can easily even set the key before transmitting the
> > packet and simply indicate that this particular packet should _not_
> > be encrypted regardless of key presence.
> 
> My ath10k firmware cannot deal with a case like this:
> 
> pkt is enqueued before key is set
> key is set
> pkt is transmitted (incorrectly)
> 
> This is because of how the tid's header-length variables are set up
> and modified when the keys are set, and I don't see any good way to
> fix this.

That seems awful, and anyway will not work with the mentioned non-IEEE
protocols that require not encrypting the rekeying frames even when
keys have been set up.

I don't know what to tell you here, I think it'd be best if you fix
that. 

> Stock ath10k firmware goes to great lengths to parse EAPOL frames and
> try to work around it in that manner, but that breaks .11r (or used
> to, I haven't tried stock firmware lately) and adds more complexity
> to the code.

It just has to be a single flag saying "don't encrypt this frame" -
nothing super complicated about that?

In ath10k it looks like HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT gets set for
this, seems easy enough?

>  From a patch someone sent to hostapd list last night, it seems we
> could get the tx-status for the EAPOL 4/4, and in that case, we
> *know* the pkt has been transmitted, so we can then set the key
> safely it would seem?

I think so, and I don't remember why we dismissed this solution. Could
be that we just decided solving the bridging issue at the same time,
while not introducing more latency, was better.

Also, the other way can possibly solve some PTK rekeying issues, so
overall the solution to go all the way seems better.

johannes
Ben Greear June 9, 2017, 8:18 p.m. UTC | #9
On 06/09/2017 01:01 PM, Johannes Berg wrote:
> On Fri, 2017-06-09 at 06:46 -0700, Ben Greear wrote:
>
>>> However, the solution is far simpler! Once you have nl80211 PAE
>>> transport, you can easily even set the key before transmitting the
>>> packet and simply indicate that this particular packet should _not_
>>> be encrypted regardless of key presence.
>>
>> My ath10k firmware cannot deal with a case like this:
>>
>> pkt is enqueued before key is set
>> key is set
>> pkt is transmitted (incorrectly)
>>
>> This is because of how the tid's header-length variables are set up
>> and modified when the keys are set, and I don't see any good way to
>> fix this.
>
> That seems awful, and anyway will not work with the mentioned non-IEEE
> protocols that require not encrypting the rekeying frames even when
> keys have been set up.
>
> I don't know what to tell you here, I think it'd be best if you fix
> that.

The case that fails is basically any packet that is currently
enqueued in the firmware when the key is set is transmitted incorrectly or not at all.
And, maybe this is only for DATA tids.

Otherwise, the no-encrypt logic should work fine.  So, it is just the race
with the EAPOL 4/4 and set-key that causes issues as far as I can tell.

It looks like the EAPOL 4/4 and set-key race is fixable without too much effort,
so I think I will focus on that for now instead of further hacking special
case logic into the firmware.

>> Stock ath10k firmware goes to great lengths to parse EAPOL frames and
>> try to work around it in that manner, but that breaks .11r (or used
>> to, I haven't tried stock firmware lately) and adds more complexity
>> to the code.
>
> It just has to be a single flag saying "don't encrypt this frame" -
> nothing super complicated about that?
>
> In ath10k it looks like HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT gets set for
> this, seems easy enough?
>
>>  From a patch someone sent to hostapd list last night, it seems we
>> could get the tx-status for the EAPOL 4/4, and in that case, we
>> *know* the pkt has been transmitted, so we can then set the key
>> safely it would seem?
>
> I think so, and I don't remember why we dismissed this solution. Could
> be that we just decided solving the bridging issue at the same time,
> while not introducing more latency, was better.
>
> Also, the other way can possibly solve some PTK rekeying issues, so
> overall the solution to go all the way seems better.

I guess I don't fully understand the PAE thing, but if you all like it,
then sounds good to me.

For kernels not supporting this new feature, it seems that having supplicant
wait for tx-status would be a good interim fix?

Thanks,
Ben
Denis Kenzior June 9, 2017, 9:42 p.m. UTC | #10
Hi Johannes,

Let me start a separate thread on this:

>> Also, while we're on this subject.  Should the kernel auto-manage
>> the LINKMODE and OPERSTATE flags?  It would seem that it already has
>> the information to do so, and having userspace manage this just
>> introduces another source of latency / possibility of race
>> conditions, etc.
>
> I'm not convinced that it does have that information in all cases, and
> I don't see how that causes race conditions or much latency, since for
> client mode userspace would probably just set that together with the
> NL80211_STA_FLAG_AUTHORIZED flag?

Well right there is a race condition.  You have 2 sockets and both 
conveying the same information?

If we can optimize away an extra kernel trap, isn't that worthwhile 
thing to do?

The most common case is issuing Dormant + UP after setting keys & 
issuing the set_station command.  Isn't setting STA_FLAG_AUTHORIZED and 
operstate to IF_OPER_UP in effect equivalent?

Oh, by the way, some drivers don't implement set_station for normal 
links (e.g. setting STA_FLAG_AUTHORIZED returns an error), should they?

The only other cases where we mess with setting linkmode I can think of are:

1. After connecting to an open network
2. After a fast transition

For 1, it would seem that the kernel can easily infer that the 
IF_OPER_UP flag should be tweaked.
For 2, this needs to be done after the new TK is installed.  Can we 
combine these steps somehow?

Regards,
-Denis
Janusz Dziedzic June 9, 2017, 9:47 p.m. UTC | #11
2017-06-09 22:18 GMT+02:00 Ben Greear <greearb@candelatech.com>:
> On 06/09/2017 01:01 PM, Johannes Berg wrote:
>>
>> On Fri, 2017-06-09 at 06:46 -0700, Ben Greear wrote:
>>
>>>> However, the solution is far simpler! Once you have nl80211 PAE
>>>> transport, you can easily even set the key before transmitting the
>>>> packet and simply indicate that this particular packet should _not_
>>>> be encrypted regardless of key presence.
>>>
>>>
>>> My ath10k firmware cannot deal with a case like this:
>>>
>>> pkt is enqueued before key is set
>>> key is set
>>> pkt is transmitted (incorrectly)
>>>
>>> This is because of how the tid's header-length variables are set up
>>> and modified when the keys are set, and I don't see any good way to
>>> fix this.
>>
>>
>> That seems awful, and anyway will not work with the mentioned non-IEEE
>> protocols that require not encrypting the rekeying frames even when
>> keys have been set up.
>>
>> I don't know what to tell you here, I think it'd be best if you fix
>> that.
>
>
> The case that fails is basically any packet that is currently
> enqueued in the firmware when the key is set is transmitted incorrectly or
> not at all.
> And, maybe this is only for DATA tids.
>
> Otherwise, the no-encrypt logic should work fine.  So, it is just the race
> with the EAPOL 4/4 and set-key that causes issues as far as I can tell.
>
> It looks like the EAPOL 4/4 and set-key race is fixable without too much
> effort,
> so I think I will focus on that for now instead of further hacking special
> case logic into the firmware.
>
Ben, as I remember correctly there is a flag,
WMI_PEER_NEED_PTK_4_WAY

This fix race (you describe) in the firmware/hw.
For 636 firmware, where we tested STA mode really hard (5 years ago),
that works perfect.
So, I am not sure why this flag don't work correctly with your firmware.

Regarding EAPOL frames I am not sure you will have real/correct ACK. I
remember in case of mgmt frames FW always return ACK - but probably
you will check that.
I also fight with some EAPOL frames, but forgot why ... Seems my brain
removed this information for some reason :-)

BR
Janusz

>>> Stock ath10k firmware goes to great lengths to parse EAPOL frames and
>>> try to work around it in that manner, but that breaks .11r (or used
>>> to, I haven't tried stock firmware lately) and adds more complexity
>>> to the code.
>>
>>
>> It just has to be a single flag saying "don't encrypt this frame" -
>> nothing super complicated about that?
>>
>> In ath10k it looks like HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT gets set for
>> this, seems easy enough?
>>
>>>  From a patch someone sent to hostapd list last night, it seems we
>>> could get the tx-status for the EAPOL 4/4, and in that case, we
>>> *know* the pkt has been transmitted, so we can then set the key
>>> safely it would seem?
>>
>>
>> I think so, and I don't remember why we dismissed this solution. Could
>> be that we just decided solving the bridging issue at the same time,
>> while not introducing more latency, was better.
>>
>> Also, the other way can possibly solve some PTK rekeying issues, so
>> overall the solution to go all the way seems better.
>
>
> I guess I don't fully understand the PAE thing, but if you all like it,
> then sounds good to me.
>
> For kernels not supporting this new feature, it seems that having supplicant
> wait for tx-status would be a good interim fix?
>
> Thanks,
> Ben
>
> --
> Ben Greear <greearb@candelatech.com>
> Candela Technologies Inc  http://www.candelatech.com
>
>
> _______________________________________________
> Hostap mailing list
> Hostap@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/hostap
Ben Greear June 9, 2017, 10:02 p.m. UTC | #12
On 06/09/2017 02:47 PM, Janusz Dziedzic wrote:
> 2017-06-09 22:18 GMT+02:00 Ben Greear <greearb@candelatech.com>:
>> On 06/09/2017 01:01 PM, Johannes Berg wrote:
>>>
>>> On Fri, 2017-06-09 at 06:46 -0700, Ben Greear wrote:
>>>
>>>>> However, the solution is far simpler! Once you have nl80211 PAE
>>>>> transport, you can easily even set the key before transmitting the
>>>>> packet and simply indicate that this particular packet should _not_
>>>>> be encrypted regardless of key presence.
>>>>
>>>>
>>>> My ath10k firmware cannot deal with a case like this:
>>>>
>>>> pkt is enqueued before key is set
>>>> key is set
>>>> pkt is transmitted (incorrectly)
>>>>
>>>> This is because of how the tid's header-length variables are set up
>>>> and modified when the keys are set, and I don't see any good way to
>>>> fix this.
>>>
>>>
>>> That seems awful, and anyway will not work with the mentioned non-IEEE
>>> protocols that require not encrypting the rekeying frames even when
>>> keys have been set up.
>>>
>>> I don't know what to tell you here, I think it'd be best if you fix
>>> that.
>>
>>
>> The case that fails is basically any packet that is currently
>> enqueued in the firmware when the key is set is transmitted incorrectly or
>> not at all.
>> And, maybe this is only for DATA tids.
>>
>> Otherwise, the no-encrypt logic should work fine.  So, it is just the race
>> with the EAPOL 4/4 and set-key that causes issues as far as I can tell.
>>
>> It looks like the EAPOL 4/4 and set-key race is fixable without too much
>> effort,
>> so I think I will focus on that for now instead of further hacking special
>> case logic into the firmware.
>>
> Ben, as I remember correctly there is a flag,
> WMI_PEER_NEED_PTK_4_WAY
>
> This fix race (you describe) in the firmware/hw.
> For 636 firmware, where we tested STA mode really hard (5 years ago),
> that works perfect.
> So, I am not sure why this flag don't work correctly with your firmware.

I removed/ignored this flag/logic to make .11r work (in 10.1.4-mumble),
just possibly .11r worked before in the 636 firmware.

>
> Regarding EAPOL frames I am not sure you will have real/correct ACK. I
> remember in case of mgmt frames FW always return ACK - but probably
> you will check that.

I fixed the rx-status, but really, valid or not, it would at least fix the race
to wait to get the status.

> I also fight with some EAPOL frames, but forgot why ... Seems my brain
> removed this information for some reason :-)

The firmware code was nasty in this regard...I wouldn't try to remember
it if I were you :)

Thanks,
Ben
Ben Greear June 10, 2017, 4:01 p.m. UTC | #13
On 06/09/2017 04:51 PM, fengbin wrote:
> seems your firmware doesn't support HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT?
> Otherwise , it should work perfectly because Key messages are marked as no hw encryption.

Well, it could be my bug.....but unless you have access to ath10k firmware, then
I can't explain further what is the particular issue.  If you do have access,
then I can give you some hints to follow the code path in question.

Thanks,
Ben
Arend van Spriel June 10, 2017, 7:13 p.m. UTC | #14
On 09-06-17 22:18, Ben Greear wrote:
> On 06/09/2017 01:01 PM, Johannes Berg wrote:
>> On Fri, 2017-06-09 at 06:46 -0700, Ben Greear wrote:
>>
>>>> However, the solution is far simpler! Once you have nl80211 PAE
>>>> transport, you can easily even set the key before transmitting the
>>>> packet and simply indicate that this particular packet should _not_
>>>> be encrypted regardless of key presence.
>>>
>>> My ath10k firmware cannot deal with a case like this:
>>>
>>> pkt is enqueued before key is set
>>> key is set
>>> pkt is transmitted (incorrectly)
>>>
>>> This is because of how the tid's header-length variables are set up
>>> and modified when the keys are set, and I don't see any good way to
>>> fix this.
>>
>> That seems awful, and anyway will not work with the mentioned non-IEEE
>> protocols that require not encrypting the rekeying frames even when
>> keys have been set up.
>>
>> I don't know what to tell you here, I think it'd be best if you fix
>> that.
> 
> The case that fails is basically any packet that is currently
> enqueued in the firmware when the key is set is transmitted incorrectly
> or not at all.
> And, maybe this is only for DATA tids.
> 
> Otherwise, the no-encrypt logic should work fine.  So, it is just the race
> with the EAPOL 4/4 and set-key that causes issues as far as I can tell.
> 
> It looks like the EAPOL 4/4 and set-key race is fixable without too much
> effort,
> so I think I will focus on that for now instead of further hacking special
> case logic into the firmware.
> 
>>> Stock ath10k firmware goes to great lengths to parse EAPOL frames and
>>> try to work around it in that manner, but that breaks .11r (or used
>>> to, I haven't tried stock firmware lately) and adds more complexity
>>> to the code.
>>
>> It just has to be a single flag saying "don't encrypt this frame" -
>> nothing super complicated about that?
>>
>> In ath10k it looks like HTT_DATA_TX_DESC_FLAGS0_NO_ENCRYPT gets set for
>> this, seems easy enough?
>>
>>>  From a patch someone sent to hostapd list last night, it seems we
>>> could get the tx-status for the EAPOL 4/4, and in that case, we
>>> *know* the pkt has been transmitted, so we can then set the key
>>> safely it would seem?
>>
>> I think so, and I don't remember why we dismissed this solution. Could
>> be that we just decided solving the bridging issue at the same time,
>> while not introducing more latency, was better.
>>
>> Also, the other way can possibly solve some PTK rekeying issues, so
>> overall the solution to go all the way seems better.
> 
> I guess I don't fully understand the PAE thing, but if you all like it,
> then sounds good to me.
> 
> For kernels not supporting this new feature, it seems that having
> supplicant
> wait for tx-status would be a good interim fix?

For what it is worth. In brcmfmac we block setting the PTK when EAPOL
frame is in transit exactly for this long-standing issue. So in short
this is what we do: in .start_xmit() we (atomically) increase pend_1x
count when ether proto is PAE and decrease it when transmit completes
(through brcmf_tx_finalize()). As long as pend_1x count is non-zero we
block setting the PTK.

Regards,
Arend
Johannes Berg June 13, 2017, 9:15 a.m. UTC | #15
On Fri, 2017-06-09 at 16:42 -0500, Denis Kenzior wrote:

> > I'm not convinced that it does have that information in all cases,
> > and I don't see how that causes race conditions or much latency,
> > since for client mode userspace would probably just set that
> > together with the NL80211_STA_FLAG_AUTHORIZED flag?
> 
> Well right there is a race condition.  You have 2 sockets and both 
> conveying the same information?

There's no race because we wait for the nl80211 operation to return
before even starting the rtnetlink operation.

> If we can optimize away an extra kernel trap, isn't that worthwhile 
> thing to do?

In itself, for something as rare as this? I don't really see it that
way.

> The most common case is issuing Dormant + UP after setting keys & 
> issuing the set_station command.  Isn't setting STA_FLAG_AUTHORIZED
> and operstate to IF_OPER_UP in effect equivalent?

Not entirely, since STA_FLAG_AUTHORIZED also applies to stations in
modes other than client mode, i.e. to stations other than the AP. In
almost all of those cases, the interface itself is already ready much
earlier.

> Oh, by the way, some drivers don't implement set_station for normal 
> links (e.g. setting STA_FLAG_AUTHORIZED returns an error), should
> they?

If they support encryption they probably should, but they might hook it
from getting the keys or so?

> The only other cases where we mess with setting linkmode I can think
> of are:
> 
> 1. After connecting to an open network
> 2. After a fast transition
> 
> For 1, it would seem that the kernel can easily infer that the 
> IF_OPER_UP flag should be tweaked.

Userspace should still set the AUTHORIZED flag in this case, just
possibly earlier and obviously without any key negotiation. The same is
true for WEP.

> For 2, this needs to be done after the new TK is installed.  Can we 
> combine these steps somehow?

I'm just not convinced it's worth it.


What I think we *can* do, with new EAPOL-over-nl80211 API, would be to
mandate that drivers supporting that, and when it's used, must only set
the carrier state after the controlled port is opened (AUTHORIZED
flag). That way, wpa_s wouldn't have to play with IF_OPER_DORMANT and
IF_OPER_UP at all.

johannes
diff mbox

Patch

diff --git a/src/rsn_supp/wpa.c b/src/rsn_supp/wpa.c
index 8a1d164..50a3006 100644
--- a/src/rsn_supp/wpa.c
+++ b/src/rsn_supp/wpa.c
@@ -1423,6 +1423,11 @@  static void wpa_supplicant_process_3_of_4(struct wpa_sm *sm,
         sm->renew_snonce = 1;

         if (key_info & WPA_KEY_INFO_INSTALL) {
+               /* Well now...what if the NIC hasn't actually put the 4/4 on the air
+                * yet?  If we set the key too soon, it is liable to corrupt the pkt being
+                * sent..   I don't know a good fix, but here is a hack.
+                */
+               os_sleep(0, 10000); /* sleep 10ms */
                 if (wpa_supplicant_install_ptk(sm, key))
                         goto failed;
         }