Message ID | 1421962219-6840-1-git-send-email-emmanuel.grumbach@intel.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Johannes Berg |
Headers | show |
On Thu, 2015-01-22 at 23:30 +0200, Emmanuel Grumbach wrote: > When mac80211 disconnects, it drops all the packets on the > queues. This happens after the net stack has been notified > that we have no link anymore (netif_carrier_off). > netif_carrier_off ensures that no new packets are sent to > xmit() callback, but we might have older packets in the > middle of the Tx path. These packets will land in the > driver's queues after the latter have been flushed. > Synchronize_net() between netif_carrier_off and drv_flush() > will fix this. > > Note that we can't call synchronize_net inside > ieee80211_flush_queues since there are flows that call > ieee80211_flush_queues and don't need synchronize_net() > which is an expensive operation. I'm -- with a heavy heart -- applying this for now. TI/wizery spent a lot of time optimising the time it takes to roam, and this single line will introduce a potential for hundreds of ms of latency into the roaming flow. You (we) really need to think about how to avoid that. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-01-23 at 10:56 +0100, Johannes Berg wrote: > On Thu, 2015-01-22 at 23:30 +0200, Emmanuel Grumbach wrote: > > When mac80211 disconnects, it drops all the packets on the > > queues. This happens after the net stack has been notified > > that we have no link anymore (netif_carrier_off). > > netif_carrier_off ensures that no new packets are sent to > > xmit() callback, but we might have older packets in the > > middle of the Tx path. These packets will land in the > > driver's queues after the latter have been flushed. > > Synchronize_net() between netif_carrier_off and drv_flush() > > will fix this. > > > > Note that we can't call synchronize_net inside > > ieee80211_flush_queues since there are flows that call > > ieee80211_flush_queues and don't need synchronize_net() > > which is an expensive operation. > > I'm -- with a heavy heart -- applying this for now. TI/wizery spent a > lot of time optimising the time it takes to roam, and this single line > will introduce a potential for hundreds of ms of latency into the > roaming flow. > > You (we) really need to think about how to avoid that. I don't think it will introduce that much of latency. Note that there is a call to flush() right after that, this means that any driver which implements this call needs to wait there. So you have the latency in either place. The only additional latency it adds is for other RCU sections / packets on other interfaces. Also - since we just stopped the netif, there can possibly be only one packet for each vif / ac processing. This is not too much data to process. Your call, but to honest, I have been optimizing the roaming as well (as you know). And it was really bad for drivers that implements the flush() callback. Especially in cases where you are already far from the AP you want to roam from. This is why I added the drop parameter and other optimizations (don't flush() after probing etc...) Again - your call.
On Fri, 2015-01-23 at 10:18 +0000, Grumbach, Emmanuel wrote: > I don't think it will introduce that much of latency. > Note that there is a call to flush() right after that, this means that > any driver which implements this call needs to wait there. So you have > the latency in either place. The only additional latency it adds is for > other RCU sections / packets on other interfaces. This is correct. > Also - since we just stopped the netif, there can possibly be only one > packet for each vif / ac processing. This is not too much data to > process. But this is the wrong conclusion. synchronize_rcu() is expensive, no matter what, especially if you have multiple cores. The RCU grace periods will practically never line up, and it needs to wait for every CPU to go through one. It's not actually just "one packet for each vif/ac" - it's a whole tail of whatever other RCU usages there are. Back when this was discussed, the wizery people measured this to hundreds of ms in many not too uncommon cases. > Your call, but to honest, I have been optimizing the roaming as well (as > you know). And it was really bad for drivers that implements the flush() > callback. Especially in cases where you are already far from the AP you > want to roam from. This is why I added the drop parameter and other > optimizations (don't flush() after probing etc...) Well, yes, but that's an upper bound - here with the synchronize_net() we're more talking about a lower bound. johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jan 23, 2015 at 12:33 PM, Johannes Berg <johannes@sipsolutions.net> wrote: > On Fri, 2015-01-23 at 10:18 +0000, Grumbach, Emmanuel wrote: > >> I don't think it will introduce that much of latency. >> Note that there is a call to flush() right after that, this means that >> any driver which implements this call needs to wait there. So you have >> the latency in either place. The only additional latency it adds is for >> other RCU sections / packets on other interfaces. > > This is correct. > >> Also - since we just stopped the netif, there can possibly be only one >> packet for each vif / ac processing. This is not too much data to >> process. > > But this is the wrong conclusion. > > synchronize_rcu() is expensive, no matter what, especially if you have > multiple cores. The RCU grace periods will practically never line up, > and it needs to wait for every CPU to go through one. > I know. > It's not actually just "one packet for each vif/ac" - it's a whole tail > of whatever other RCU usages there are. Back when this was discussed, > the wizery people measured this to hundreds of ms in many not too > uncommon cases. > That's the part I didn't know. I wasn't aware that this synchronize_net() was there and that someone removed it. >> Your call, but to honest, I have been optimizing the roaming as well (as >> you know). And it was really bad for drivers that implements the flush() >> callback. Especially in cases where you are already far from the AP you >> want to roam from. This is why I added the drop parameter and other >> optimizations (don't flush() after probing etc...) > > Well, yes, but that's an upper bound - here with the synchronize_net() > we're more talking about a lower bound. > I also have to admit I was wrong earlier: the flush() call in set_disassoc has drop set to true, so this call will return immediately. I just wonder how we handle the case where we still have packets in the Tx path and we bring everything down. I can check the code, but not right now. -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2015-01-23 at 13:36 +0200, Emmanuel Grumbach wrote: > > It's not actually just "one packet for each vif/ac" - it's a whole tail > > of whatever other RCU usages there are. Back when this was discussed, > > the wizery people measured this to hundreds of ms in many not too > > uncommon cases. > > That's the part I didn't know. I wasn't aware that this > synchronize_net() was there and that someone removed it. This particular one might not have been there - I don't remember. There were some in this path and we removed quite a few. However, if you have keys, wpa_s is currently clearing them, which calls it for every single key. I once had worked on an optimisation here (telling wpa_s not to clear keys because we handle that when stations are deleted anyway) but so far that didn't really go anywhere. > I just wonder how we handle the case where we still have packets in > the Tx path and we bring everything down. I can check the code, but > not right now. Well - I didn't expect you to check anything now :-) I was more noting this down for 'future work'. It's possible, perhaps, that we cannot do anything else here, but then we should think about the other optimisations again... johannes -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/net/mac80211/mlme.c b/net/mac80211/mlme.c index 1d6bf01..1eafa63 100644 --- a/net/mac80211/mlme.c +++ b/net/mac80211/mlme.c @@ -2016,6 +2016,9 @@ static void ieee80211_set_disassoc(struct ieee80211_sub_if_data *sdata, /* disable per-vif ps */ ieee80211_recalc_ps_vif(sdata); + /* flush out all packets */ + synchronize_net(); + /* * drop any frame before deauth/disassoc, this can be data or * management frame. Since we are disconnecting, we should not
When mac80211 disconnects, it drops all the packets on the queues. This happens after the net stack has been notified that we have no link anymore (netif_carrier_off). netif_carrier_off ensures that no new packets are sent to xmit() callback, but we might have older packets in the middle of the Tx path. These packets will land in the driver's queues after the latter have been flushed. Synchronize_net() between netif_carrier_off and drv_flush() will fix this. Note that we can't call synchronize_net inside ieee80211_flush_queues since there are flows that call ieee80211_flush_queues and don't need synchronize_net() which is an expensive operation. Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> --- net/mac80211/mlme.c | 3 +++ 1 file changed, 3 insertions(+)