diff mbox series

mac80211: Simplify queue selection

Message ID 20220510155828.9406-1-alexander@wetzel-home.de (mailing list archive)
State Changes Requested
Delegated to: Johannes Berg
Headers show
Series mac80211: Simplify queue selection | expand

Commit Message

Alexander Wetzel May 10, 2022, 3:58 p.m. UTC
Let iTXQ drivers also register four queues in netdev and move queue
assignment to ndo_select_queue(), like it's done for other drivers.

This gets rid of a special case in mac80211 and also increases the
chance that when we call skb_get_hash() the skb is still hot in the CPU
buffers.

Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
---
This patch is currently mostly an attempt to review our queueing
procedure. It's also assuming the patch offered in
https://lore.kernel.org/linux-wireless/20220507083706.384513-1-alexander@wetzel-home.de/
has been merged. Without that there will be one merge conflict in
ieee80211_tx_control_port(). But it only drops skb_get_hash() there,
which is also in the function without the patch.

I've only a basic understanding about QoS and qdisc and I may miss
some obvious reason why we can't use ndo_select_queue() with iTXQ
drivers. Looking at the code I just started wondering why we have
all the special handling when there seems to be no pressing need.
Now the additional sta lookup may make it a bit slower. But on the other
side we move skb_get_hash() up in the call chain, increasing the change
the skb is still in the CPU buffers. (Which according to our discussion
in https://lore.kernel.org/linux-wireless/875ymf263a.fsf@toke.dk/ seems
to be the only reason for the call.)

'1974da8b31e6 ("mac80211: when using iTXQ, select the queue in
ieee80211_subif_start_xmit")' from Felix claims "mac80211 is using its
internal queues anyway". But following that logic we can always bypass
ndo_select_queue(). Or even stop registering the handler at all.
(Which is something I also consider.)

I still don't understand why we don't want to use qdisc with the iTXQ
drivers. I now just made sure we don't start using qdiscs with this
patch to start with the least invasive approach. Anyone able to shed some
light on that?

I've also test this patch with an iwlmvm card and it looks like it's just
doing what I expect:
netdev is indeed accepting IEEE80211_AC_BE (2) as queue number and the
warning from netdev_cap_txqueue() is not triggered.

The card also looks unchanged when I check it with "tc qdisc show":
	qdisc noqueue 0: dev wlp0s12f0 root refcnt 2

Alexander

 net/mac80211/iface.c | 10 +++++-----
 net/mac80211/tx.c    | 13 -------------
 net/mac80211/wme.c   |  5 ++---
 3 files changed, 7 insertions(+), 21 deletions(-)

Comments

Felix Fietkau May 10, 2022, 4:10 p.m. UTC | #1
On 10.05.22 17:58, Alexander Wetzel wrote:
> Let iTXQ drivers also register four queues in netdev and move queue
> assignment to ndo_select_queue(), like it's done for other drivers.
> 
> This gets rid of a special case in mac80211 and also increases the
> chance that when we call skb_get_hash() the skb is still hot in the CPU
> buffers.
> 
> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>

This has the disadvantage of requiring a redundant sta lookup in the tx 
path for iTXQ drivers. I think the CPU cost of that one is probably 
higher than any potential gain from calling skb_get_hash a bit earlier.

- Felix
Alexander Wetzel May 10, 2022, 4:13 p.m. UTC | #2
On 10.05.22 18:10, Felix Fietkau wrote:
> On 10.05.22 17:58, Alexander Wetzel wrote:
>> Let iTXQ drivers also register four queues in netdev and move queue
>> assignment to ndo_select_queue(), like it's done for other drivers.
>>
>> This gets rid of a special case in mac80211 and also increases the
>> chance that when we call skb_get_hash() the skb is still hot in the CPU
>> buffers.
>>
>> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
> 
> This has the disadvantage of requiring a redundant sta lookup in the tx 
> path for iTXQ drivers. I think the CPU cost of that one is probably 
> higher than any potential gain from calling skb_get_hash a bit earlier.

Found that one, yes. But why do we then not drop ndo_select_queue() for 
all drivers?

Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess 
then it would make more sense to move the ndo_select_queue() into 
netdev, so all drivers get the optimization.

Alexander
Felix Fietkau May 10, 2022, 4:21 p.m. UTC | #3
On 10.05.22 18:13, Alexander Wetzel wrote:
> On 10.05.22 18:10, Felix Fietkau wrote:
>> On 10.05.22 17:58, Alexander Wetzel wrote:
>>> Let iTXQ drivers also register four queues in netdev and move queue
>>> assignment to ndo_select_queue(), like it's done for other drivers.
>>>
>>> This gets rid of a special case in mac80211 and also increases the
>>> chance that when we call skb_get_hash() the skb is still hot in the CPU
>>> buffers.
>>>
>>> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
>> 
>> This has the disadvantage of requiring a redundant sta lookup in the tx 
>> path for iTXQ drivers. I think the CPU cost of that one is probably 
>> higher than any potential gain from calling skb_get_hash a bit earlier.
> 
> Found that one, yes. But why do we then not drop ndo_select_queue() for
> all drivers?
> 
> Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess
> then it would make more sense to move the ndo_select_queue() into
> netdev, so all drivers get the optimization.

When not using iTXQ, packets are buffered in netdev qdisc. In order for 
that to work, ndo_select_queue needs to be called *before* packets are 
put into qdisc (so long before the actual mac80211 xmit handler).
To fix this properly, we'd need to move to iTXQ for all drivers (by 
having mac80211 push packets via drv_tx calls after pulling from iTXQ).
This can probably be done without having to modify the drivers.

- Felix
Johannes Berg May 10, 2022, 4:22 p.m. UTC | #4
On Tue, 2022-05-10 at 18:21 +0200, Felix Fietkau wrote:
> 
> To fix this properly, we'd need to move to iTXQ for all drivers (by 
> having mac80211 push packets via drv_tx calls after pulling from iTXQ).
> This can probably be done without having to modify the drivers.
> 

I looked at this, and I kind of really want to do that, but it's not
_entirely_ trivial... If somebody's sufficiently motivated I'd
definitely be happy :)

johannes
Toke Høiland-Jørgensen May 10, 2022, 6:25 p.m. UTC | #5
Alexander Wetzel <alexander@wetzel-home.de> writes:

> I still don't understand why we don't want to use qdisc with the iTXQ
> drivers. I now just made sure we don't start using qdiscs with this
> patch to start with the least invasive approach. Anyone able to shed
> some light on that?

Because of aggregation, basically. To build an aggregate you need to be
able to pull several packets that has the same STA+TID; if all packets
on the interface are just sitting in a qdisc you have no interface to do
this.

Before switching to TXQs, drivers (the ath9k in particular) would solve
this by having another layer of per-sta queueing inside the driver,
which was just a dump FIFO that added a bunch of latency. The iTXQs
moved that up to mac80211, while doing proper queue management (flow
queueing + CoDel derived from the FQ-CoDel qdisc) at the same time; this
made the qdisc layer redundant for most purposes, which is why we
switched to noqueue by default.

-Toke
Toke Høiland-Jørgensen May 10, 2022, 6:26 p.m. UTC | #6
Alexander Wetzel <alexander@wetzel-home.de> writes:

> On 10.05.22 18:10, Felix Fietkau wrote:
>> On 10.05.22 17:58, Alexander Wetzel wrote:
>>> Let iTXQ drivers also register four queues in netdev and move queue
>>> assignment to ndo_select_queue(), like it's done for other drivers.
>>>
>>> This gets rid of a special case in mac80211 and also increases the
>>> chance that when we call skb_get_hash() the skb is still hot in the CPU
>>> buffers.
>>>
>>> Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
>> 
>> This has the disadvantage of requiring a redundant sta lookup in the tx 
>> path for iTXQ drivers. I think the CPU cost of that one is probably 
>> higher than any potential gain from calling skb_get_hash a bit earlier.
>
> Found that one, yes. But why do we then not drop ndo_select_queue() for 
> all drivers?
>
> Or maybe just call skb_get_hash() in ndo_select_queue()... But I guess 
> then it would make more sense to move the ndo_select_queue() into 
> netdev, so all drivers get the optimization.

The reason it's an optimisation for mac80211 is that we know
skb_get_hash() will always be called later (when enqueueing into the
TXQ). This is not always the case for other drivers, so that is probably
a dubious optimisation to have in general.

-Toke
Johannes Berg May 10, 2022, 7:06 p.m. UTC | #7
On Tue, 2022-05-10 at 18:10 +0200, Felix Fietkau wrote:
> On 10.05.22 17:58, Alexander Wetzel wrote:
> > Let iTXQ drivers also register four queues in netdev and move queue
> > assignment to ndo_select_queue(), like it's done for other drivers.
> > 
> > This gets rid of a special case in mac80211 and also increases the
> > chance that when we call skb_get_hash() the skb is still hot in the CPU
> > buffers.
> > 
> > Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
> 
> This has the disadvantage of requiring a redundant sta lookup in the tx 
> path for iTXQ drivers. I think the CPU cost of that one is probably 
> higher than any potential gain from calling skb_get_hash a bit earlier.
> 

However, that's independent - we can still calculate the hash there, and
then bail out, i.e. put it right before the "if (wake_tx_queue) return"
part, no?

OTOH we don't even need the hash in the other cases, do we?

johannes
Toke Høiland-Jørgensen May 10, 2022, 7:27 p.m. UTC | #8
Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2022-05-10 at 18:10 +0200, Felix Fietkau wrote:
>> On 10.05.22 17:58, Alexander Wetzel wrote:
>> > Let iTXQ drivers also register four queues in netdev and move queue
>> > assignment to ndo_select_queue(), like it's done for other drivers.
>> > 
>> > This gets rid of a special case in mac80211 and also increases the
>> > chance that when we call skb_get_hash() the skb is still hot in the CPU
>> > buffers.
>> > 
>> > Signed-off-by: Alexander Wetzel <alexander@wetzel-home.de>
>> 
>> This has the disadvantage of requiring a redundant sta lookup in the tx 
>> path for iTXQ drivers. I think the CPU cost of that one is probably 
>> higher than any potential gain from calling skb_get_hash a bit earlier.
>> 
>
> However, that's independent - we can still calculate the hash there, and
> then bail out, i.e. put it right before the "if (wake_tx_queue) return"
> part, no?
>
> OTOH we don't even need the hash in the other cases, do we?

Nope, that was my point on the "moving it to shared netdev core"
question :)

-Toke
Alexander Wetzel May 15, 2022, 11:10 a.m. UTC | #9
On 10.05.22 18:22, Johannes Berg wrote:
> On Tue, 2022-05-10 at 18:21 +0200, Felix Fietkau wrote:
>>
>> To fix this properly, we'd need to move to iTXQ for all drivers (by
>> having mac80211 push packets via drv_tx calls after pulling from iTXQ).
>> This can probably be done without having to modify the drivers.
>>
> 
> I looked at this, and I kind of really want to do that, but it's not
> _entirely_ trivial... If somebody's sufficiently motivated I'd
> definitely be happy :)

No promise I'll see that trough... But I'll start to look into that.
While I don't have much time for coding at the moment this sounds 
worthwhile and instructive at the same time.

Also thanks to all for the other feedback's. I'm starting to see the 
woods and not only the trees:-)


Alexander
diff mbox series

Patch

diff --git a/net/mac80211/iface.c b/net/mac80211/iface.c
index 41531478437c..98b499197c41 100644
--- a/net/mac80211/iface.c
+++ b/net/mac80211/iface.c
@@ -2009,13 +2009,13 @@  int ieee80211_if_add(struct ieee80211_local *local, const char *name,
 			txq_size += sizeof(struct txq_info) +
 				    local->hw.txq_data_size;
 
-		if (local->ops->wake_tx_queue) {
+		if (local->ops->wake_tx_queue)
 			if_setup = ieee80211_if_setup_no_queue;
-		} else {
+		else
 			if_setup = ieee80211_if_setup;
-			if (local->hw.queues >= IEEE80211_NUM_ACS)
-				txqs = IEEE80211_NUM_ACS;
-		}
+
+		if (local->hw.queues >= IEEE80211_NUM_ACS)
+			txqs = IEEE80211_NUM_ACS;
 
 		ndev = alloc_netdev_mqs(size + txq_size,
 					name, name_assign_type,
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 2fabf6c4547c..2298e2e1a4ce 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -4181,12 +4181,6 @@  void __ieee80211_subif_start_xmit(struct sk_buff *skb,
 	if (IS_ERR(sta))
 		sta = NULL;
 
-	if (local->ops->wake_tx_queue) {
-		u16 queue = __ieee80211_select_queue(sdata, sta, skb);
-		skb_set_queue_mapping(skb, queue);
-		skb_get_hash(skb);
-	}
-
 	ieee80211_aggr_check(sdata, sta, skb);
 
 	sk_pacing_shift_update(skb->sk, sdata->local->hw.tx_sk_pacing_shift);
@@ -4442,12 +4436,6 @@  static void ieee80211_8023_xmit(struct ieee80211_sub_if_data *sdata,
 	struct tid_ampdu_tx *tid_tx;
 	u8 tid;
 
-	if (local->ops->wake_tx_queue) {
-		u16 queue = __ieee80211_select_queue(sdata, sta, skb);
-		skb_set_queue_mapping(skb, queue);
-		skb_get_hash(skb);
-	}
-
 	if (unlikely(test_bit(SCAN_SW_SCANNING, &local->scanning)) &&
 	    test_bit(SDATA_STATE_OFFCHANNEL, &sdata->state))
 		goto out_free;
@@ -5772,7 +5760,6 @@  int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
 
 	/* skb bypassed queue selection in net/core/dev.c, do it now */
 	skb_set_queue_mapping(skb, ieee80211_select_queue(sdata, skb));
-	skb_get_hash(skb);
 
 	/* mutex lock is only needed for incrementing the cookie counter */
 	mutex_lock(&local->mtx);
diff --git a/net/mac80211/wme.c b/net/mac80211/wme.c
index 774afefbe0b0..5047faee6974 100644
--- a/net/mac80211/wme.c
+++ b/net/mac80211/wme.c
@@ -186,9 +186,8 @@  u16 ieee80211_select_queue(struct ieee80211_sub_if_data *sdata,
 	const u8 *ra = NULL;
 	u16 ret;
 
-	/* when using iTXQ, we can do this later */
-	if (local->ops->wake_tx_queue)
-		return 0;
+	/* Calculate hash early, hopefully it's still in the CPU buffer */
+	skb_get_hash(skb);
 
 	if (local->hw.queues < IEEE80211_NUM_ACS || skb->len < 6) {
 		skb->priority = 0; /* required for correct WPA/11i MIC */