Message ID | 153635897010.14170.2992498632345986102.stgit@alrua-x1 (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Johannes Berg |
Headers | show |
Series | Move TXQ scheduling into mac80211 | expand |
On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: > > Usage of the new API is optional, so drivers can be ported one at a time. With the 1:1 hardware queue/txq mapping in iwlwifi (we're close to getting that patch in, though now the Jewish holidays mean a delay), I'm not sure we'd be able to do this at all in iwlwifi. So this may not be a case of porting one at a time until we can get rid of it ... It would be nice to be able to use it, for better queue behaviour, but it would mean much more accounting in iwlwifi? Not even sure off the top of my head how to do that. johannes
On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: > +/** > + * ieee80211_next_txq - get next tx queue to pull packets from > + * > + * @hw: pointer as obtained from ieee80211_alloc_hw() > + * @ac: AC number to return packets from. > + * @first: Setting this to true signifies the start of a new scheduling round. > + * Each TXQ will only be returned at most exactly once in each round. > + * > + * Returns the next txq if successful, %NULL if no queue is eligible. If a txq > + * is returned, it will have been removed from the scheduler queue and needs to > + * be re-scheduled with ieee80211_schedule_txq() to continue to be active. Note > + * that the driver is responsible for locking to ensuring two different threads > + * don't schedule the same AC simultaneously. "Note that [...] to ensure [...]" would seem better to me? > + */ > +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, > + bool first); Why should "ac" be signed? Is there some (undocumented) behaviour that allows for passing -1 for "don't care" or "pick highest with frames"? You said "optionally" in the commit message, but I don't think that's true, since you just use ac to index the array in the code. > +bool ieee80211_schedule_txq(struct ieee80211_hw *hw, > + struct ieee80211_txq *txq) > +{ > + struct ieee80211_local *local = hw_to_local(hw); > + struct txq_info *txqi = to_txq_info(txq); > + bool ret = false; > + > + spin_lock_bh(&local->active_txq_lock); > + > + if (list_empty(&txqi->schedule_order)) { Is there a case where it's allowed to call this while *not* empty? At least for the drivers it seems not, so perhaps when called from the driver it should WARN() here or so? I'm just a bit worried that drivers will get this a bit wrong, and we'll never know, but things won't work properly in some weird way. > + list_add_tail(&txqi->schedule_order, > + &local->active_txqs[txq->ac]); > + ret = true; > + } > + > + spin_unlock_bh(&local->active_txq_lock); > + > + return ret; > +} > +EXPORT_SYMBOL(ieee80211_schedule_txq); I'd remove the return value - the driver has no need to know that it raced with potential rescheduling in mac80211 due to a new packet, and you don't use the return value in mac80211 either. It also seems to me that you forgot to say when it should be rescheduled? As is, following the documentation would sort of makes it seem you should take the queue and put it back at the end (without any conditions), and that could lead to "scheduling loops" since empty queues would always be put back when they shouldn't be? Also, come to think of it, but I guess the next patch(es) solve(s) that - if mac80211 adds a packet while you have this queue scheduled then you would take that packet even if it's questionable it belongs to this round? From a driver perspective, I think I'd prefer an API called e.g. ieee80211_return_txq() that does the check internally if we need to schedule the queue (i.e. there are packets on it). I was going to say we could even annotate with sparse, but no - we can't because ieee80211_next_txq() may return NULL. Still, it's easier to ensure that all cleanup paths call ieee80211_return_txq() if it's not conditional, IMHO. > +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, > + bool first) > +{ > + struct ieee80211_local *local = hw_to_local(hw); > + struct txq_info *txqi = NULL; > + > + spin_lock_bh(&local->active_txq_lock); > + > + if (first) > + local->schedule_round[ac]++; (here - clearly ac must be 0 <= ac < IEEE80211_NUM_ACS) > + if (list_empty(&local->active_txqs[ac])) > + goto out; > + > + txqi = list_first_entry(&local->active_txqs[ac], > + struct txq_info, > + schedule_order); > + > + if (txqi->schedule_round == local->schedule_round[ac]) > + txqi = NULL; that assigment seems unnecessary? txqi defaults to NULL. > + else > + list_del_init(&txqi->schedule_order); > + out: > + spin_unlock_bh(&local->active_txq_lock); > + > + if (!txqi) > + return NULL; > + > + txqi->schedule_round = local->schedule_round[ac]; > + return &txqi->txq; > +} > +EXPORT_SYMBOL(ieee80211_next_txq); johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: >> >> Usage of the new API is optional, so drivers can be ported one at a time. > > With the 1:1 hardware queue/txq mapping in iwlwifi (we're close to > getting that patch in, though now the Jewish holidays mean a delay), > I'm not sure we'd be able to do this at all in iwlwifi. So this may > not be a case of porting one at a time until we can get rid of it ... Could you elaborate a bit more on how the hwq/txq stuff works in iwl? Does the driver just hand off a TXQ to the hardware on wake_txq(), which is then scheduled by the hardware after that? Or how does the mapping to hwqs work, and how is the hardware notified that there are still packets queued / that new packets have arrived for an already mapped txq? > It would be nice to be able to use it, for better queue behaviour, but > it would mean much more accounting in iwlwifi? Not even sure off the > top of my head how to do that. I think we'll need to have some kind of fallback airtime estimation in mac80211 that calculates airtime from packet size and rate information. Not all drivers can get this from the hardware, it seems. See my reply to Kan about the BQL-like behaviour as well. Does iwl get airtime usage information for every packet on tx complete? -Toke
Johannes Berg <johannes@sipsolutions.net> writes: > On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: > >> +/** >> + * ieee80211_next_txq - get next tx queue to pull packets from >> + * >> + * @hw: pointer as obtained from ieee80211_alloc_hw() >> + * @ac: AC number to return packets from. >> + * @first: Setting this to true signifies the start of a new scheduling round. >> + * Each TXQ will only be returned at most exactly once in each round. >> + * >> + * Returns the next txq if successful, %NULL if no queue is eligible. If a txq >> + * is returned, it will have been removed from the scheduler queue and needs to >> + * be re-scheduled with ieee80211_schedule_txq() to continue to be active. Note >> + * that the driver is responsible for locking to ensuring two different threads >> + * don't schedule the same AC simultaneously. > > "Note that [...] to ensure [...]" would seem better to me? > >> + */ >> +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, >> + bool first); > > Why should "ac" be signed? Is there some (undocumented) behaviour that > allows for passing -1 for "don't care" or "pick highest with frames"? > > You said "optionally" in the commit message, but I don't think that's > true, since you just use ac to index the array in the code. There was in the previous version, but I removed that; guess I forgot to change the sign and the commit message ;) >> +bool ieee80211_schedule_txq(struct ieee80211_hw *hw, >> + struct ieee80211_txq *txq) >> +{ >> + struct ieee80211_local *local = hw_to_local(hw); >> + struct txq_info *txqi = to_txq_info(txq); >> + bool ret = false; >> + >> + spin_lock_bh(&local->active_txq_lock); >> + >> + if (list_empty(&txqi->schedule_order)) { > > Is there a case where it's allowed to call this while *not* empty? At > least for the drivers it seems not, so perhaps when called from the > driver it should WARN() here or so? Yeah, mac80211 calls this on wake_txq, where it will often be scheduled already. And since that can happen while drivers schedule stuff, it could also be the case that a driver re-schedules a queue that is already scheduled. > I'm just a bit worried that drivers will get this a bit wrong, and > we'll never know, but things won't work properly in some weird way. What, subtle bugs in the data path that causes hangs in hard-to-debug cases? Nah, that never happens ;) >> + list_add_tail(&txqi->schedule_order, >> + &local->active_txqs[txq->ac]); >> + ret = true; >> + } >> + >> + spin_unlock_bh(&local->active_txq_lock); >> + >> + return ret; >> +} >> +EXPORT_SYMBOL(ieee80211_schedule_txq); > > I'd remove the return value - the driver has no need to know that it > raced with potential rescheduling in mac80211 due to a new packet, and > you don't use the return value in mac80211 either. Yeah, good point; that was left over from when the call to wake_txq was conditional on this actually doing something... > It also seems to me that you forgot to say when it should be > rescheduled? As is, following the documentation would sort of makes it > seem you should take the queue and put it back at the end (without any > conditions), and that could lead to "scheduling loops" since empty > queues would always be put back when they shouldn't be? > > Also, come to think of it, but I guess the next patch(es) solve(s) that > - if mac80211 adds a packet while you have this queue scheduled then you > would take that packet even if it's questionable it belongs to this > round? > > From a driver perspective, I think I'd prefer an API called e.g. > > ieee80211_return_txq() > > that does the check internally if we need to schedule the queue (i.e. > there are packets on it). I was going to say we could even annotate with > sparse, but no - we can't because ieee80211_next_txq() may return NULL. > Still, it's easier to ensure that all cleanup paths call > ieee80211_return_txq() if it's not conditional, IMHO. Good idea, will do. >> +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, >> + bool first) >> +{ >> + struct ieee80211_local *local = hw_to_local(hw); >> + struct txq_info *txqi = NULL; >> + >> + spin_lock_bh(&local->active_txq_lock); >> + >> + if (first) >> + local->schedule_round[ac]++; > > (here - clearly ac must be 0 <= ac < IEEE80211_NUM_ACS) > >> + if (list_empty(&local->active_txqs[ac])) >> + goto out; >> + >> + txqi = list_first_entry(&local->active_txqs[ac], >> + struct txq_info, >> + schedule_order); >> + >> + if (txqi->schedule_round == local->schedule_round[ac]) >> + txqi = NULL; > > that assigment seems unnecessary? txqi defaults to NULL. No, this is an 'undo' of the previous line. I.e., we get the first txqi, check if we've already seen it this round, and if we have, unset txqi so the function returns NULL (causing the driver to stop looping). -Toke
On Mon, 2018-09-10 at 12:57 +0200, Toke Høiland-Jørgensen wrote: > Johannes Berg <johannes@sipsolutions.net> writes: > > > On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: > > > > > > Usage of the new API is optional, so drivers can be ported one at a time. > > > > With the 1:1 hardware queue/txq mapping in iwlwifi (we're close to > > getting that patch in, though now the Jewish holidays mean a delay), > > I'm not sure we'd be able to do this at all in iwlwifi. So this may > > not be a case of porting one at a time until we can get rid of it ... > > Could you elaborate a bit more on how the hwq/txq stuff works in iwl? I can try. > Does the driver just hand off a TXQ to the hardware on wake_txq(), which > is then scheduled by the hardware after that? Or how does the mapping to > hwqs work, and how is the hardware notified that there are still packets > queued / that new packets have arrived for an already mapped txq? Basically, we use the TXQ driver data to do this. On initialization, we set all TXQs to have no hardware queue mapped. Then, when the first wake happens for this TXQ, we allocate a hardware queue for this RA/TID. From then on, right now we basically just try to refill the hardware queue from the TXQ whenever the hardware releases frames from that queue. If there are none, we fall back to putting them on the hardware queue from the wake signal. > > It would be nice to be able to use it, for better queue behaviour, but > > it would mean much more accounting in iwlwifi? Not even sure off the > > top of my head how to do that. > > I think we'll need to have some kind of fallback airtime estimation in > mac80211 that calculates airtime from packet size and rate information. > Not all drivers can get this from the hardware, it seems. See my reply > to Kan about the BQL-like behaviour as well. > > Does iwl get airtime usage information for every packet on tx complete? Yes, we do. johannes
On Mon, 2018-09-10 at 13:02 +0200, Toke Høiland-Jørgensen wrote: > > > Is there a case where it's allowed to call this while *not* empty? At > > least for the drivers it seems not, so perhaps when called from the > > driver it should WARN() here or so? > > Yeah, mac80211 calls this on wake_txq, where it will often be scheduled > already. And since that can happen while drivers schedule stuff, it > could also be the case that a driver re-schedules a queue that is > already scheduled. Right, I kinda gathered that but was thinking more from a driver POV as I was reviewing, makes sense. > > I'm just a bit worried that drivers will get this a bit wrong, and > > we'll never know, but things won't work properly in some weird way. > > What, subtle bugs in the data path that causes hangs in hard-to-debug > cases? Nah, that never happens ;) Good :-P Actually though, we can't put a warn on there, because the driver might take a txq, then mac80211 puts it on the list due to a new packet, and then the driver also returns it. > > > + txqi = list_first_entry(&local->active_txqs[ac], > > > + struct txq_info, > > > + schedule_order); > > > + > > > + if (txqi->schedule_round == local->schedule_round[ac]) > > > + txqi = NULL; > > > > that assigment seems unnecessary? txqi defaults to NULL. > > No, this is an 'undo' of the previous line. I.e., we get the first txqi, > check if we've already seen it this round, and if we have, unset txqi so > the function returns NULL (causing the driver to stop looping). Err, yeah, obviously. johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2018-09-10 at 12:57 +0200, Toke Høiland-Jørgensen wrote: >> Johannes Berg <johannes@sipsolutions.net> writes: >> >> > On Sat, 2018-09-08 at 00:22 +0200, Toke Høiland-Jørgensen wrote: >> > > >> > > Usage of the new API is optional, so drivers can be ported one at a time. >> > >> > With the 1:1 hardware queue/txq mapping in iwlwifi (we're close to >> > getting that patch in, though now the Jewish holidays mean a delay), >> > I'm not sure we'd be able to do this at all in iwlwifi. So this may >> > not be a case of porting one at a time until we can get rid of it ... >> >> Could you elaborate a bit more on how the hwq/txq stuff works in iwl? > > I can try. > >> Does the driver just hand off a TXQ to the hardware on wake_txq(), which >> is then scheduled by the hardware after that? Or how does the mapping to >> hwqs work, and how is the hardware notified that there are still packets >> queued / that new packets have arrived for an already mapped txq? > > Basically, we use the TXQ driver data to do this. On initialization, we > set all TXQs to have no hardware queue mapped. Then, when the first wake > happens for this TXQ, we allocate a hardware queue for this RA/TID. > > From then on, right now we basically just try to refill the hardware > queue from the TXQ whenever the hardware releases frames from that > queue. If there are none, we fall back to putting them on the hardware > queue from the wake signal. OK. So if the TXQ is ineligible to dequeue packets, but still have them queued, there would need to be a signal (such as wake_txq()) when it becomes eligible again, right? -Toke
On Mon, 2018-09-10 at 14:39 +0200, Toke Høiland-Jørgensen wrote: > > From then on, right now we basically just try to refill the hardware > > queue from the TXQ whenever the hardware releases frames from that > > queue. If there are none, we fall back to putting them on the hardware > > queue from the wake signal. > > OK. So if the TXQ is ineligible to dequeue packets, but still have them > queued, there would need to be a signal (such as wake_txq()) when it > becomes eligible again, right? Right. It wouldn't have to be wake_txq, but that seems as appropriate as anything else. If we were to use ieee80211_airtime_may_transmit(), we'd have to have something that tells us "I previously told you it may not transmit, but now I changed my mind" :-) johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2018-09-10 at 14:39 +0200, Toke Høiland-Jørgensen wrote: > >> > From then on, right now we basically just try to refill the hardware >> > queue from the TXQ whenever the hardware releases frames from that >> > queue. If there are none, we fall back to putting them on the hardware >> > queue from the wake signal. >> >> OK. So if the TXQ is ineligible to dequeue packets, but still have them >> queued, there would need to be a signal (such as wake_txq()) when it >> becomes eligible again, right? > > Right. It wouldn't have to be wake_txq, but that seems as appropriate as > anything else. > > If we were to use ieee80211_airtime_may_transmit(), we'd have to have > something that tells us "I previously told you it may not transmit, but > now I changed my mind" :-) Yes, exactly. Hmm, I'll see what I can come up with :) -Toke
On Mon, 2018-09-10 at 15:08 +0200, Toke Høiland-Jørgensen wrote: > Johannes Berg <johannes@sipsolutions.net> writes: > > > On Mon, 2018-09-10 at 14:39 +0200, Toke Høiland-Jørgensen wrote: > > > > > > From then on, right now we basically just try to refill the hardware > > > > queue from the TXQ whenever the hardware releases frames from that > > > > queue. If there are none, we fall back to putting them on the hardware > > > > queue from the wake signal. > > > > > > OK. So if the TXQ is ineligible to dequeue packets, but still have them > > > queued, there would need to be a signal (such as wake_txq()) when it > > > becomes eligible again, right? > > > > Right. It wouldn't have to be wake_txq, but that seems as appropriate as > > anything else. > > > > If we were to use ieee80211_airtime_may_transmit(), we'd have to have > > something that tells us "I previously told you it may not transmit, but > > now I changed my mind" :-) > > Yes, exactly. Hmm, I'll see what I can come up with :) No need to look at this right now - let's get this stuff sorted out first :) johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2018-09-10 at 15:08 +0200, Toke Høiland-Jørgensen wrote: >> Johannes Berg <johannes@sipsolutions.net> writes: >> >> > On Mon, 2018-09-10 at 14:39 +0200, Toke Høiland-Jørgensen wrote: >> > >> > > > From then on, right now we basically just try to refill the hardware >> > > > queue from the TXQ whenever the hardware releases frames from that >> > > > queue. If there are none, we fall back to putting them on the hardware >> > > > queue from the wake signal. >> > > >> > > OK. So if the TXQ is ineligible to dequeue packets, but still have them >> > > queued, there would need to be a signal (such as wake_txq()) when it >> > > becomes eligible again, right? >> > >> > Right. It wouldn't have to be wake_txq, but that seems as appropriate as >> > anything else. >> > >> > If we were to use ieee80211_airtime_may_transmit(), we'd have to have >> > something that tells us "I previously told you it may not transmit, but >> > now I changed my mind" :-) >> >> Yes, exactly. Hmm, I'll see what I can come up with :) > > No need to look at this right now - let's get this stuff sorted out > first :) Right, sure, I'll get the port of ath9k done first; but doesn't hurt to think about API compatibility with the other drivers at the same time as well... If we have the start_schedule() / end_schedule() pair anyway, the latter could notify any TXQs that became eligible during the scheduling round. Also, instead of having the three different API functions (next_txq()/may_tx()/schedule_txq()), we could have get_txq(txq)/put_txq(txq) which would always need to be paired; but the argument to get_txq() could be optional, and if the driver passes NULL it means "give me the next available TXQ". So for ath9k it would be: start_schedule(ac); while ((txq = get_txq(NULL)) { queue_aggregate(txq); put_txq(txq); } end_schedule(ac); And for ath10k/iwlwifi it would be: on_hw_notify(txq) { start_schedule(ac); if (txq = get_txq(txq)) { queue_packets(txq); put_txq(txq); } end_schedule(ac); } I think that would be simpler, API-wise? -Toke
On Mon, 2018-09-10 at 15:18 +0200, Toke Høiland-Jørgensen wrote: > If we have the start_schedule() / end_schedule() pair anyway, the latter > could notify any TXQs that became eligible during the scheduling round. Do we even need end_schedule()? It's hard to pass multiple things to a single call (do you build a list?), so having start_schedule(), get_txq(), return_txq() would be sufficient? > Also, instead of having the three different API functions > (next_txq()/may_tx()/schedule_txq()), we could have get_txq(txq)/put_txq(txq) > which would always need to be paired; but the argument to get_txq() > could be optional, and if the driver passes NULL it means "give me the > next available TXQ". I can't say I like this. It makes the meaning totally different: * with NULL: use the internal scheduler to determine which one is good to use next * non-NULL: essentially equivalent to may_tx() > So for ath9k it would be: > > > start_schedule(ac); > while ((txq = get_txq(NULL)) { > queue_aggregate(txq); > put_txq(txq); > } > end_schedule(ac); > > And for ath10k/iwlwifi it would be: > > on_hw_notify(txq) { > start_schedule(ac); > if (txq = get_txq(txq)) { > queue_packets(txq); > put_txq(txq); > } > end_schedule(ac); > } > > > I think that would be simpler, API-wise? I can't say I see much point in overloading get_txq() that way. You'd never use it the same way. Also, would you really start_schedule(ac) in the hw-managed case? It seems like not? Basically it seems to me that in the hw-managed case all you need is may_tx()? And in fact, once you opt in you don't even really need *that* since mac80211 can just return NULL from get_skb()? johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2018-09-10 at 15:18 +0200, Toke Høiland-Jørgensen wrote: > >> If we have the start_schedule() / end_schedule() pair anyway, the latter >> could notify any TXQs that became eligible during the scheduling round. > > Do we even need end_schedule()? It's hard to pass multiple things to a > single call (do you build a list?), so having > > start_schedule(), get_txq(), return_txq() > > would be sufficient? Well, start_schedule() / end_schedule() would be needed if we are going to add locking in mac80211? >> Also, instead of having the three different API functions >> (next_txq()/may_tx()/schedule_txq()), we could have get_txq(txq)/put_txq(txq) >> which would always need to be paired; but the argument to get_txq() >> could be optional, and if the driver passes NULL it means "give me the >> next available TXQ". > > I can't say I like this. It makes the meaning totally different: > > * with NULL: use the internal scheduler to determine which one is good > to use next > * non-NULL: essentially equivalent to may_tx() Yeah, it'll be two completely different uses for the same function; but there wouldn't be two different APIs to keep track of. I'm fine with keeping them as separately named functions. :) >> So for ath9k it would be: >> >> >> start_schedule(ac); >> while ((txq = get_txq(NULL)) { >> queue_aggregate(txq); >> put_txq(txq); >> } >> end_schedule(ac); >> >> And for ath10k/iwlwifi it would be: >> >> on_hw_notify(txq) { >> start_schedule(ac); >> if (txq = get_txq(txq)) { >> queue_packets(txq); >> put_txq(txq); >> } >> end_schedule(ac); >> } >> >> >> I think that would be simpler, API-wise? > > I can't say I see much point in overloading get_txq() that way. You'd > never use it the same way. > > Also, would you really start_schedule(ac) in the hw-managed case? If we decide mac80211 needs to do locking to prevent two threads from scheduling the same ac, that would also be needed for the hw-managed case? > It seems like not? Basically it seems to me that in the hw-managed > case all you need is may_tx()? And in fact, once you opt in you don't > even really need *that* since mac80211 can just return NULL from > get_skb()? Yeah, we could just throttle in get_skb(); the separate call was to avoid the overhead of the check for every packet. Typically, you'll pick a TXQ, then dequeue multiple packets from it in succession; with a separate call to may_tx(), you only do the check once, not for every packet... -Toke
On Mon, 2018-09-10 at 17:00 +0200, Toke Høiland-Jørgensen wrote: > > Do we even need end_schedule()? It's hard to pass multiple things to a > > single call (do you build a list?), so having > > > > start_schedule(), get_txq(), return_txq() > > > > would be sufficient? > > Well, start_schedule() / end_schedule() would be needed if we are going > to add locking in mac80211? [...] > If we decide mac80211 needs to do locking to prevent two threads from > scheduling the same ac, that would also be needed for the hw-managed > case? Yes, good point. > > It seems like not? Basically it seems to me that in the hw-managed > > case all you need is may_tx()? And in fact, once you opt in you don't > > even really need *that* since mac80211 can just return NULL from > > get_skb()? > > Yeah, we could just throttle in get_skb(); the separate call was to > avoid the overhead of the check for every packet. Typically, you'll pick > a TXQ, then dequeue multiple packets from it in succession; with a > separate call to may_tx(), you only do the check once, not for every > packet... Yeah, also a good point. Still, txq = get_txq(txq) doesn't feel right, so better to keep that separate I think. johannes
Johannes Berg <johannes@sipsolutions.net> writes: > On Mon, 2018-09-10 at 17:00 +0200, Toke Høiland-Jørgensen wrote: > >> > Do we even need end_schedule()? It's hard to pass multiple things to a >> > single call (do you build a list?), so having >> > >> > start_schedule(), get_txq(), return_txq() >> > >> > would be sufficient? >> >> Well, start_schedule() / end_schedule() would be needed if we are going >> to add locking in mac80211? > > [...] > >> If we decide mac80211 needs to do locking to prevent two threads from >> scheduling the same ac, that would also be needed for the hw-managed >> case? > > Yes, good point. > >> > It seems like not? Basically it seems to me that in the hw-managed >> > case all you need is may_tx()? And in fact, once you opt in you don't >> > even really need *that* since mac80211 can just return NULL from >> > get_skb()? >> >> Yeah, we could just throttle in get_skb(); the separate call was to >> avoid the overhead of the check for every packet. Typically, you'll pick >> a TXQ, then dequeue multiple packets from it in succession; with a >> separate call to may_tx(), you only do the check once, not for every >> packet... > > Yeah, also a good point. > > Still, txq = get_txq(txq) doesn't feel right, so better to keep that > separate I think. Right, will do :) -Toke
diff --git a/include/net/mac80211.h b/include/net/mac80211.h index c4fadbafbf21..ff413d9caa50 100644 --- a/include/net/mac80211.h +++ b/include/net/mac80211.h @@ -108,9 +108,16 @@ * The driver is expected to initialize its private per-queue data for stations * and interfaces in the .add_interface and .sta_add ops. * - * The driver can't access the queue directly. To dequeue a frame, it calls - * ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a queue, it - * calls the .wake_tx_queue driver op. + * The driver can't access the queue directly. To dequeue a frame from a + * txq, it calls ieee80211_tx_dequeue(). Whenever mac80211 adds a new frame to a + * queue, it calls the .wake_tx_queue driver op. + * + * Drivers can optionally delegate responsibility for scheduling queues to + * mac80211, to take advantage of airtime fairness accounting. In this case, to + * obtain the next queue to pull frames from, the driver calls + * ieee80211_next_txq(). The driver is then expected to re-schedule the txq + * using ieee80211_schedule_txq() if it is still active after the driver has + * finished pulling packets from it. * * For AP powersave TIM handling, the driver only needs to indicate if it has * buffered packets in the driver specific data structures by calling @@ -6045,13 +6052,43 @@ void ieee80211_unreserve_tid(struct ieee80211_sta *sta, u8 tid); * ieee80211_tx_dequeue - dequeue a packet from a software tx queue * * @hw: pointer as obtained from ieee80211_alloc_hw() - * @txq: pointer obtained from station or virtual interface + * @txq: pointer obtained from station or virtual interface, or from + * ieee80211_next_txq() * * Returns the skb if successful, %NULL if no frame was available. */ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, struct ieee80211_txq *txq); +/** + * ieee80211_schedule_txq - add txq to scheduling loop + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @txq: pointer obtained from station or virtual interface + * + * Returns %true if the txq was actually added to the scheduling, + * %false otherwise. + */ +bool ieee80211_schedule_txq(struct ieee80211_hw *hw, + struct ieee80211_txq *txq); + +/** + * ieee80211_next_txq - get next tx queue to pull packets from + * + * @hw: pointer as obtained from ieee80211_alloc_hw() + * @ac: AC number to return packets from. + * @first: Setting this to true signifies the start of a new scheduling round. + * Each TXQ will only be returned at most exactly once in each round. + * + * Returns the next txq if successful, %NULL if no queue is eligible. If a txq + * is returned, it will have been removed from the scheduler queue and needs to + * be re-scheduled with ieee80211_schedule_txq() to continue to be active. Note + * that the driver is responsible for locking to ensuring two different threads + * don't schedule the same AC simultaneously. + */ +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, + bool first); + /** * ieee80211_txq_get_depth - get pending frame/byte count of given txq * diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c index 69e831bc317b..e94b1a0407af 100644 --- a/net/mac80211/agg-tx.c +++ b/net/mac80211/agg-tx.c @@ -229,7 +229,7 @@ ieee80211_agg_start_txq(struct sta_info *sta, int tid, bool enable) clear_bit(IEEE80211_TXQ_STOP, &txqi->flags); local_bh_disable(); rcu_read_lock(); - drv_wake_tx_queue(sta->sdata->local, txqi); + schedule_and_wake_txq(sta->sdata->local, txqi); rcu_read_unlock(); local_bh_enable(); } diff --git a/net/mac80211/driver-ops.h b/net/mac80211/driver-ops.h index e42c641b6190..f1578c394f75 100644 --- a/net/mac80211/driver-ops.h +++ b/net/mac80211/driver-ops.h @@ -1173,6 +1173,13 @@ static inline void drv_wake_tx_queue(struct ieee80211_local *local, local->ops->wake_tx_queue(&local->hw, &txq->txq); } +static inline void schedule_and_wake_txq(struct ieee80211_local *local, + struct txq_info *txqi) +{ + ieee80211_schedule_txq(&local->hw, &txqi->txq); + drv_wake_tx_queue(local, txqi); +} + static inline int drv_can_aggregate_in_amsdu(struct ieee80211_local *local, struct sk_buff *head, struct sk_buff *skb) diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index f40a2167935f..75f9c9ab364f 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -829,6 +829,8 @@ enum txq_info_flags { * a fq_flow which is already owned by a different tin * @def_cvars: codel vars for @def_flow * @frags: used to keep fragments created after dequeue + * @schedule_order: used with ieee80211_local->active_txqs + * @schedule_round: counter to prevent infinite loops on TXQ scheduling */ struct txq_info { struct fq_tin tin; @@ -836,6 +838,8 @@ struct txq_info { struct codel_vars def_cvars; struct codel_stats cstats; struct sk_buff_head frags; + struct list_head schedule_order; + u16 schedule_round; unsigned long flags; /* keep last! */ @@ -1127,6 +1131,11 @@ struct ieee80211_local { struct codel_vars *cvars; struct codel_params cparams; + /* protects active_txqs and txqi->schedule_order */ + spinlock_t active_txq_lock; + struct list_head active_txqs[IEEE80211_NUM_ACS]; + u16 schedule_round[IEEE80211_NUM_ACS]; + const struct ieee80211_ops *ops; /* diff --git a/net/mac80211/main.c b/net/mac80211/main.c index 77381017bac7..c6d88758cbb7 100644 --- a/net/mac80211/main.c +++ b/net/mac80211/main.c @@ -663,6 +663,10 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t priv_data_len, spin_lock_init(&local->rx_path_lock); spin_lock_init(&local->queue_stop_reason_lock); + for (i = 0; i < IEEE80211_NUM_ACS; i++) + INIT_LIST_HEAD(&local->active_txqs[i]); + spin_lock_init(&local->active_txq_lock); + INIT_LIST_HEAD(&local->chanctx_list); mutex_init(&local->chanctx_mtx); diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c index fb8c2252ac0e..c2f5cb7df54f 100644 --- a/net/mac80211/sta_info.c +++ b/net/mac80211/sta_info.c @@ -1249,7 +1249,7 @@ void ieee80211_sta_ps_deliver_wakeup(struct sta_info *sta) if (!sta->sta.txq[i] || !txq_has_queue(sta->sta.txq[i])) continue; - drv_wake_tx_queue(local, to_txq_info(sta->sta.txq[i])); + schedule_and_wake_txq(local, to_txq_info(sta->sta.txq[i])); } skb_queue_head_init(&pending); diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index c42bfa1dcd2c..8bcc447e2cbc 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -1445,6 +1445,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata, codel_vars_init(&txqi->def_cvars); codel_stats_init(&txqi->cstats); __skb_queue_head_init(&txqi->frags); + INIT_LIST_HEAD(&txqi->schedule_order); txqi->txq.vif = &sdata->vif; @@ -1485,6 +1486,9 @@ void ieee80211_txq_purge(struct ieee80211_local *local, fq_tin_reset(fq, tin, fq_skb_free_func); ieee80211_purge_tx_queue(&local->hw, &txqi->frags); + spin_lock_bh(&local->active_txq_lock); + list_del_init(&txqi->schedule_order); + spin_unlock_bh(&local->active_txq_lock); } void ieee80211_txq_set_params(struct ieee80211_local *local) @@ -1601,7 +1605,7 @@ static bool ieee80211_queue_skb(struct ieee80211_local *local, ieee80211_txq_enqueue(local, txqi, skb); spin_unlock_bh(&fq->lock); - drv_wake_tx_queue(local, txqi); + schedule_and_wake_txq(local, txqi); return true; } @@ -3623,6 +3627,60 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, } EXPORT_SYMBOL(ieee80211_tx_dequeue); +bool ieee80211_schedule_txq(struct ieee80211_hw *hw, + struct ieee80211_txq *txq) +{ + struct ieee80211_local *local = hw_to_local(hw); + struct txq_info *txqi = to_txq_info(txq); + bool ret = false; + + spin_lock_bh(&local->active_txq_lock); + + if (list_empty(&txqi->schedule_order)) { + list_add_tail(&txqi->schedule_order, + &local->active_txqs[txq->ac]); + ret = true; + } + + spin_unlock_bh(&local->active_txq_lock); + + return ret; +} +EXPORT_SYMBOL(ieee80211_schedule_txq); + +struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, s8 ac, + bool first) +{ + struct ieee80211_local *local = hw_to_local(hw); + struct txq_info *txqi = NULL; + + spin_lock_bh(&local->active_txq_lock); + + if (first) + local->schedule_round[ac]++; + + if (list_empty(&local->active_txqs[ac])) + goto out; + + txqi = list_first_entry(&local->active_txqs[ac], + struct txq_info, + schedule_order); + + if (txqi->schedule_round == local->schedule_round[ac]) + txqi = NULL; + else + list_del_init(&txqi->schedule_order); + out: + spin_unlock_bh(&local->active_txq_lock); + + if (!txqi) + return NULL; + + txqi->schedule_round = local->schedule_round[ac]; + return &txqi->txq; +} +EXPORT_SYMBOL(ieee80211_next_txq); + void __ieee80211_subif_start_xmit(struct sk_buff *skb, struct net_device *dev, u32 info_flags)
This adds an API to mac80211 to handle scheduling of TXQs and changes the interface between driver and mac80211 for TXQ handling as follows: - The wake_tx_queue callback interface includes only the ac number instead of the TXQ. The driver is expected to retrieve TXQs from ieee80211_next_txq() - Two new mac80211 functions are added: ieee80211_next_txq() and ieee80211_schedule_txq(). The former returns the next TXQ that should be scheduled, and is how the driver gets a queue to pull packets from. The latter is called internally by mac80211 to start scheduling a queue, and the driver is supposed to call it to re-schedule the TXQ after it is finished pulling packets from it (unless the queue emptied). Drivers can optionally filter TXQs on ac to support per-AC hardware queue designs. Usage of the new API is optional, so drivers can be ported one at a time. Signed-off-by: Toke Høiland-Jørgensen <toke@toke.dk> --- include/net/mac80211.h | 45 ++++++++++++++++++++++++++++++--- net/mac80211/agg-tx.c | 2 + net/mac80211/driver-ops.h | 7 +++++ net/mac80211/ieee80211_i.h | 9 +++++++ net/mac80211/main.c | 4 +++ net/mac80211/sta_info.c | 2 + net/mac80211/tx.c | 60 +++++++++++++++++++++++++++++++++++++++++++- 7 files changed, 122 insertions(+), 7 deletions(-)