Message ID | 20220510155828.9406-1-alexander@wetzel-home.de (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Johannes Berg |
Headers | show |
Series | mac80211: Simplify queue selection | expand |
On 10.05.22 17:58, Alexander Wetzel wrote: > Let iTXQ drivers also register four queues in netdev and move queue > assignment to ndo_select_queue(), like it's done for other drivers. > > This gets rid of a special case in mac80211 and also increases the > chance that when we call skb_get_hash() the skb is still hot in the CPU > buffers. > > Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> This has the disadvantage of requiring a redundant sta lookup in the tx path for iTXQ drivers. I think the CPU cost of that one is probably higher than any potential gain from calling skb_get_hash a bit earlier. - Felix
On 10.05.22 18:10, Felix Fietkau wrote: > On 10.05.22 17:58, Alexander Wetzel wrote: >> Let iTXQ drivers also register four queues in netdev and move queue >> assignment to ndo_select_queue(), like it's done for other drivers. >> >> This gets rid of a special case in mac80211 and also increases the >> chance that when we call skb_get_hash() the skb is still hot in the CPU >> buffers. >> >> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> > > This has the disadvantage of requiring a redundant sta lookup in the tx > path for iTXQ drivers. I think the CPU cost of that one is probably > higher than any potential gain from calling skb_get_hash a bit earlier. Found that one, yes. But why do we then not drop ndo_select_queue() for all drivers? Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess then it would make more sense to move the ndo_select_queue() into netdev, so all drivers get the optimization. Alexander
On 10.05.22 18:13, Alexander Wetzel wrote: > On 10.05.22 18:10, Felix Fietkau wrote: >> On 10.05.22 17:58, Alexander Wetzel wrote: >>> Let iTXQ drivers also register four queues in netdev and move queue >>> assignment to ndo_select_queue(), like it's done for other drivers. >>> >>> This gets rid of a special case in mac80211 and also increases the >>> chance that when we call skb_get_hash() the skb is still hot in the CPU >>> buffers. >>> >>> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> >> >> This has the disadvantage of requiring a redundant sta lookup in the tx >> path for iTXQ drivers. I think the CPU cost of that one is probably >> higher than any potential gain from calling skb_get_hash a bit earlier. > > Found that one, yes. But why do we then not drop ndo_select_queue() for > all drivers? > > Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess > then it would make more sense to move the ndo_select_queue() into > netdev, so all drivers get the optimization. When not using iTXQ, packets are buffered in netdev qdisc. In order for that to work, ndo_select_queue needs to be called *before* packets are put into qdisc (so long before the actual mac80211 xmit handler). To fix this properly, we'd need to move to iTXQ for all drivers (by having mac80211 push packets via drv_tx calls after pulling from iTXQ). This can probably be done without having to modify the drivers. - Felix
On Tue, 2022-05-10 at 18:21 +0200, Felix Fietkau wrote: > > To fix this properly, we'd need to move to iTXQ for all drivers (by > having mac80211 push packets via drv_tx calls after pulling from iTXQ). > This can probably be done without having to modify the drivers. > I looked at this, and I kind of really want to do that, but it's not _entirely_ trivial... If somebody's sufficiently motivated I'd definitely be happy :) johannes
Alexander Wetzel <alexander@wetzel-home.de> writes: > I still don't understand why we don't want to use qdisc with the iTXQ > drivers. I now just made sure we don't start using qdiscs with this > patch to start with the least invasive approach. Anyone able to shed > some light on that? Because of aggregation, basically. To build an aggregate you need to be able to pull several packets that has the same STA+TID; if all packets on the interface are just sitting in a qdisc you have no interface to do this. Before switching to TXQs, drivers (the ath9k in particular) would solve this by having another layer of per-sta queueing inside the driver, which was just a dump FIFO that added a bunch of latency. The iTXQs moved that up to mac80211, while doing proper queue management (flow queueing + CoDel derived from the FQ-CoDel qdisc) at the same time; this made the qdisc layer redundant for most purposes, which is why we switched to noqueue by default. -Toke
Alexander Wetzel <alexander@wetzel-home.de> writes: > On 10.05.22 18:10, Felix Fietkau wrote: >> On 10.05.22 17:58, Alexander Wetzel wrote: >>> Let iTXQ drivers also register four queues in netdev and move queue >>> assignment to ndo_select_queue(), like it's done for other drivers. >>> >>> This gets rid of a special case in mac80211 and also increases the >>> chance that when we call skb_get_hash() the skb is still hot in the CPU >>> buffers. >>> >>> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> >> >> This has the disadvantage of requiring a redundant sta lookup in the tx >> path for iTXQ drivers. I think the CPU cost of that one is probably >> higher than any potential gain from calling skb_get_hash a bit earlier. > > Found that one, yes. But why do we then not drop ndo_select_queue() for > all drivers? > > Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess > then it would make more sense to move the ndo_select_queue() into > netdev, so all drivers get the optimization. The reason it's an optimisation for mac80211 is that we know skb_get_hash() will always be called later (when enqueueing into the TXQ). This is not always the case for other drivers, so that is probably a dubious optimisation to have in general. -Toke
On Tue, 2022-05-10 at 18:10 +0200, Felix Fietkau wrote: > On 10.05.22 17:58, Alexander Wetzel wrote: > > Let iTXQ drivers also register four queues in netdev and move queue > > assignment to ndo_select_queue(), like it's done for other drivers. > > > > This gets rid of a special case in mac80211 and also increases the > > chance that when we call skb_get_hash() the skb is still hot in the CPU > > buffers. > > > > Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> > > This has the disadvantage of requiring a redundant sta lookup in the tx > path for iTXQ drivers. I think the CPU cost of that one is probably > higher than any potential gain from calling skb_get_hash a bit earlier. > However, that's independent - we can still calculate the hash there, and then bail out, i.e. put it right before the "if (wake_tx_queue) return" part, no? OTOH we don't even need the hash in the other cases, do we? johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Tue, 2022-05-10 at 18:10 +0200, Felix Fietkau wrote: >> On 10.05.22 17:58, Alexander Wetzel wrote: >> > Let iTXQ drivers also register four queues in netdev and move queue >> > assignment to ndo_select_queue(), like it's done for other drivers. >> > >> > This gets rid of a special case in mac80211 and also increases the >> > chance that when we call skb_get_hash() the skb is still hot in the CPU >> > buffers. >> > >> > Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> >> >> This has the disadvantage of requiring a redundant sta lookup in the tx >> path for iTXQ drivers. I think the CPU cost of that one is probably >> higher than any potential gain from calling skb_get_hash a bit earlier. >> > > However, that's independent - we can still calculate the hash there, and > then bail out, i.e. put it right before the "if (wake_tx_queue) return" > part, no? > > OTOH we don't even need the hash in the other cases, do we? Nope, that was my point on the "moving it to shared netdev core" question :) -Toke
On 10.05.22 18:22, Johannes Berg wrote: > On Tue, 2022-05-10 at 18:21 +0200, Felix Fietkau wrote: >> >> To fix this properly, we'd need to move to iTXQ for all drivers (by >> having mac80211 push packets via drv_tx calls after pulling from iTXQ). >> This can probably be done without having to modify the drivers. >> > > I looked at this, and I kind of really want to do that, but it's not > _entirely_ trivial... If somebody's sufficiently motivated I'd > definitely be happy :) No promise I'll see that trough... But I'll start to look into that. While I don't have much time for coding at the moment this sounds worthwhile and instructive at the same time. Also thanks to all for the other feedback's. I'm starting to see the woods and not only the trees:-) Alexander
diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c index 41531478437c..98b499197c41 100644 --- a/net/mac80211/iface.c +++ b/net/mac80211/iface.c @@ -2009,13 +2009,13 @@ int ieee80211_if_add(struct ieee80211_local *local, const char *name, txq_size += sizeof(struct txq_info) + local->hw.txq_data_size; - if (local->ops->wake_tx_queue) { + if (local->ops->wake_tx_queue) if_setup = ieee80211_if_setup_no_queue; - } else { + else if_setup = ieee80211_if_setup; - if (local->hw.queues >= IEEE80211_NUM_ACS) - txqs = IEEE80211_NUM_ACS; - } + + if (local->hw.queues >= IEEE80211_NUM_ACS) + txqs = IEEE80211_NUM_ACS; ndev = alloc_netdev_mqs(size + txq_size, name, name_assign_type, diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 2fabf6c4547c..2298e2e1a4ce 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -4181,12 +4181,6 @@ void __ieee80211_subif_start_xmit(struct sk_buff *skb, if (IS_ERR(sta)) sta = NULL; - if (local->ops->wake_tx_queue) { - u16 queue = __ieee80211_select_queue(sdata, sta, skb); - skb_set_queue_mapping(skb, queue); - skb_get_hash(skb); - } - ieee80211_aggr_check(sdata, sta, skb); sk_pacing_shift_update(skb->sk, sdata->local->hw.tx_sk_pacing_shift); @@ -4442,12 +4436,6 @@ static void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata, struct tid_ampdu_tx *tid_tx; u8 tid; - if (local->ops->wake_tx_queue) { - u16 queue = __ieee80211_select_queue(sdata, sta, skb); - skb_set_queue_mapping(skb, queue); - skb_get_hash(skb); - } - if (unlikely(test_bit(SCAN_SW_SCANNING, &local->scanning)) && test_bit(SDATA_STATE_OFFCHANNEL, &sdata->state)) goto out_free; @@ -5772,7 +5760,6 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev, /* skb bypassed queue selection in net/core/dev.c, do it now */ skb_set_queue_mapping(skb, ieee80211_select_queue(sdata, skb)); - skb_get_hash(skb); /* mutex lock is only needed for incrementing the cookie counter */ mutex_lock(&local->mtx); diff --git a/net/mac80211/wme.c b/net/mac80211/wme.c index 774afefbe0b0..5047faee6974 100644 --- a/net/mac80211/wme.c +++ b/net/mac80211/wme.c @@ -186,9 +186,8 @@ u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata, const u8 *ra = NULL; u16 ret; - /* when using iTXQ, we can do this later */ - if (local->ops->wake_tx_queue) - return 0; + /* Calculate hash early, hopefully it's still in the CPU buffer */ + skb_get_hash(skb); if (local->hw.queues < IEEE80211_NUM_ACS || skb->len < 6) { skb->priority = 0; /* required for correct WPA/11i MIC */
Let iTXQ drivers also register four queues in netdev and move queue assignment to ndo_select_queue(), like it's done for other drivers. This gets rid of a special case in mac80211 and also increases the chance that when we call skb_get_hash() the skb is still hot in the CPU buffers. Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de> --- This patch is currently mostly an attempt to review our queueing procedure. It's also assuming the patch offered in https://lore.kernel.org/linux-wireless/20220507083706.384513-1-alexander@wetzel-home.de/ has been merged. Without that there will be one merge conflict in ieee80211_tx_control_port(). But it only drops skb_get_hash() there, which is also in the function without the patch. I've only a basic understanding about QoS and qdisc and I may miss some obvious reason why we can't use ndo_select_queue() with iTXQ drivers. Looking at the code I just started wondering why we have all the special handling when there seems to be no pressing need. Now the additional sta lookup may make it a bit slower. But on the other side we move skb_get_hash() up in the call chain, increasing the change the skb is still in the CPU buffers. (Which according to our discussion in https://lore.kernel.org/linux-wireless/875ymf263a.fsf@toke.dk/ seems to be the only reason for the call.) '1974da8b31e6 ("mac80211: when using iTXQ, select the queue in ieee80211_subif_start_xmit")' from Felix claims "mac80211 is using its internal queues anyway". But following that logic we can always bypass ndo_select_queue(). Or even stop registering the handler at all. (Which is something I also consider.) I still don't understand why we don't want to use qdisc with the iTXQ drivers. I now just made sure we don't start using qdiscs with this patch to start with the least invasive approach. Anyone able to shed some light on that? I've also test this patch with an iwlmvm card and it looks like it's just doing what I expect: netdev is indeed accepting IEEE80211_AC_BE (2) as queue number and the warning from netdev_cap_txqueue() is not triggered. The card also looks unchanged when I check it with "tc qdisc show": qdisc noqueue 0: dev wlp0s12f0 root refcnt 2 Alexander net/mac80211/iface.c | 10 +++++----- net/mac80211/tx.c | 13 ------------- net/mac80211/wme.c | 5 ++--- 3 files changed, 7 insertions(+), 21 deletions(-)