Message ID | 156889576869.191202.510507546538322707.stgit@alrua-x1 (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Johannes Berg |
Headers | show |
Series | Add Airtime Queue Limits (AQL) to mac80211 | expand |
On 9/19/19 5:22 AM, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > This patch ports that idea over to mac80211. The basic idea is simple > enough: Whenever we dequeue a packet from the TXQs and send it to the > driver, we estimate its airtime usage, based on the last recorded TX rate > of the station that packet is destined for. The way to decide the last recorded TX rate could be vary among drivers. In terms of ath10k driver and FW, they use 4 PPDUs to update the Tx rate. Isn't it too small sampling number to be used for AQL? Peter
On 9/19/19 10:44 AM, Peter Oh wrote: > On 9/19/19 5:22 AM, Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> This patch ports that idea over to mac80211. The basic idea is simple >> enough: Whenever we dequeue a packet from the TXQs and send it to the >> driver, we estimate its airtime usage, based on the last recorded TX rate >> of the station that packet is destined for. > > The way to decide the last recorded TX rate could be vary among drivers. In terms of ath10k driver and FW, they use 4 PPDUs to update the Tx rate. Isn't it too > small sampling number to be used for AQL? Probably it is not exactly the last 4 either, since the report comes back indirectly and not synchronized with the tx path? Thanks, Ben
On 9/19/19 10:46 AM, Ben Greear wrote: > On 9/19/19 10:44 AM, Peter Oh wrote: >> On 9/19/19 5:22 AM, Toke Høiland-Jørgensen wrote: >>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>> >>> This patch ports that idea over to mac80211. The basic idea is simple >>> enough: Whenever we dequeue a packet from the TXQs and send it to the >>> driver, we estimate its airtime usage, based on the last recorded TX >>> rate >>> of the station that packet is destined for. >> >> The way to decide the last recorded TX rate could be vary among >> drivers. In terms of ath10k driver and FW, they use 4 PPDUs to update >> the Tx rate. Isn't it too small sampling number to be used for AQL? > > Probably it is not exactly the last 4 either, since the report comes > back indirectly and not > synchronized with the tx path? > The point of my question is "the last recorded Tx raith small nte is derived wumber of PPDUs and if it's ok to use it for AQL calculation or not". Thanks, Peter
On 9/19/19 10:54 AM, Peter Oh wrote: > > On 9/19/19 10:46 AM, Ben Greear wrote: >> On 9/19/19 10:44 AM, Peter Oh wrote: >>> On 9/19/19 5:22 AM, Toke Høiland-Jørgensen wrote: >>>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>>> >>>> This patch ports that idea over to mac80211. The basic idea is simple >>>> enough: Whenever we dequeue a packet from the TXQs and send it to the >>>> driver, we estimate its airtime usage, based on the last recorded >>>> TX rate >>>> of the station that packet is destined for. >>> >>> The way to decide the last recorded TX rate could be vary among >>> drivers. In terms of ath10k driver and FW, they use 4 PPDUs to >>> update the Tx rate. Isn't it too small sampling number to be used >>> for AQL? >> >> Probably it is not exactly the last 4 either, since the report comes >> back indirectly and not >> synchronized with the tx path? >> The point of my question is "the last recorded Tx rate is derived from small number of PPDUs and if it's OK to use it for AQL calculation or not". Thanks, Peter
Peter Oh <peter.oh@eero.com> writes: > On 9/19/19 10:46 AM, Ben Greear wrote: >> On 9/19/19 10:44 AM, Peter Oh wrote: >>> On 9/19/19 5:22 AM, Toke Høiland-Jørgensen wrote: >>>> From: Toke Høiland-Jørgensen <toke@redhat.com> >>>> >>>> This patch ports that idea over to mac80211. The basic idea is simple >>>> enough: Whenever we dequeue a packet from the TXQs and send it to the >>>> driver, we estimate its airtime usage, based on the last recorded TX >>>> rate >>>> of the station that packet is destined for. >>> >>> The way to decide the last recorded TX rate could be vary among >>> drivers. In terms of ath10k driver and FW, they use 4 PPDUs to update >>> the Tx rate. Isn't it too small sampling number to be used for AQL? >> >> Probably it is not exactly the last 4 either, since the report comes >> back indirectly and not >> synchronized with the tx path? >> > The point of my question is "the last recorded Tx raith small nte is > derived wumber of PPDUs and if it's ok to use it for AQL calculation > or not". We're leaving a bit of slack in the system by limiting the buffering to two aggregates' worth of buffering instead of just one. This is to prevent starvation in case our estimate is off. In the other direction, (i.e., if the rate drops suddenly), that will translate to more bloat until the queue drains. Not much we can do about that; we can only work with the data we have... Still, the Google guys reported pretty good results using this method for ath10k with their out of tree patch. So I think that in many cases, doing this will be an improvement; obviously, it won't be perfect. But it beats the 1000 pkt+ queue limit currently in (some versions of) ath10k firmware. In an ideal world, the firmware would enforce this minimum queueing and throttle itself, but, well, sadly we don't live an ideal world... -Toke
For the record, this was the google report on their implementation in 3.18. http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf The latency vs RvR graphs at the end were to die for.
On Thu, Sep 19, 2019 at 2:18 PM Dave Taht <dave.taht@gmail.com> wrote: > > For the record, this was the google report on their implementation in 3.18. > > http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf From skimming that paper it sounds like this is shipping in the current Google WiFi product. Is that correct? /john
John Yates <john@yates-sheets.org> writes: > On Thu, Sep 19, 2019 at 2:18 PM Dave Taht <dave.taht@gmail.com> wrote: >> >> For the record, this was the google report on their implementation in 3.18. >> >> http://flent-newark.bufferbloat.net/~d/Airtime%20based%20queue%20limit%20for%20FQ_CoDel%20in%20wireless%20interface.pdf > > From skimming that paper it sounds like this is shipping in the > current Google WiFi product. Is that correct? I believe so, yeah. The chromiumos patch tracker is here: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13 -Toke
> From: Toke Høiland-Jørgensen <toke@redhat.com> > > Some devices have deep buffers in firmware and/or hardware which prevents > the FQ structure in mac80211 from effectively limiting bufferbloat on the > link. For Ethernet devices we have BQL to limit the lower-level queues, but > this cannot be applied to mac80211 because transmit rates can vary wildly > between packets depending on which station we are transmitting it to. > > To overcome this, we can use airtime-based queue limiting (AQL), where we > estimate the transmission time for each packet before dequeueing it, and > use that to limit the amount of data in-flight to the hardware. This idea > was originally implemented as part of the out-of-tree airtime fairness > patch to ath10k[0] in chromiumos. > > This patch ports that idea over to mac80211. The basic idea is simple > enough: Whenever we dequeue a packet from the TXQs and send it to the > driver, we estimate its airtime usage, based on the last recorded TX rate > of the station that packet is destined for. We keep a running per-AC total > of airtime queued for the whole device, and when that total climbs above 8 > ms' worth of data (corresponding to two maximum-sized aggregates), we > simply throttle the queues until it drops down again. > > The estimated airtime for each skb is stored in the tx_info, so we can > subtract the same amount from the running total when the skb is freed or > recycled. The throttling mechanism relies on this accounting to be > accurate (i.e., that we are not freeing skbs without subtracting any > airtime they were accounted for), so we put the subtraction into > ieee80211_report_used_skb(). > > This patch does *not* include any mechanism to wake a throttled TXQ again, > on the assumption that this will happen anyway as a side effect of whatever > freed the skb (most commonly a TX completion). > > The throttling mechanism only kicks in if the queued airtime total goes > above the limit. Since mac80211 calculates the time based on the reported > last_tx_time from the driver, the whole throttling mechanism only kicks in > for drivers that actually report this value. With the exception of > multicast, where we always calculate an estimated tx time on the assumption > that multicast is transmitted at the lowest (6 Mbps) rate. > > The throttling added in this patch is in addition to any throttling already > performed by the airtime fairness mechanism, and in principle the two > mechanisms are orthogonal (and currently also uses two different sources of > airtime). In the future, we could amend this, using the airtime estimates > calculated by this mechanism as a fallback input to the airtime fairness > scheduler, to enable airtime fairness even on drivers that don't have a > hardware source of airtime usage for each station. > > [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 > > Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > --- > net/mac80211/debugfs.c | 24 ++++++++++++++++++++++++ > net/mac80211/ieee80211_i.h | 7 +++++++ > net/mac80211/status.c | 22 ++++++++++++++++++++++ > net/mac80211/tx.c | 38 +++++++++++++++++++++++++++++++++++++- > 4 files changed, 90 insertions(+), 1 deletion(-) Hi Toke, Thx a lot for working on this. Few comments inline. Regards, Lorenzo > > diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c > index 568b3b276931..c846c6e7f3e3 100644 > --- a/net/mac80211/debugfs.c > +++ b/net/mac80211/debugfs.c > @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { > .llseek = default_llseek, > }; > [...] > @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, > tx.skb = skb; > tx.sdata = vif_to_sdata(info->control.vif); > > - if (txq->sta) > + pktlen = skb->len + 38; > + if (txq->sta) { > tx.sta = container_of(txq->sta, struct sta_info, sta); > + if (tx.sta->last_tx_bitrate) { > + airtime = (pktlen * 8 * 1000 * > + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; > + airtime += IEEE80211_AIRTIME_OVERHEAD; Here we are not taking into account aggregation burst size (it is done in a rough way in chromeos implementation) and tx retries. I have not carried out any tests so far but I think IEEE80211_AIRTIME_OVERHEAD will led to a significant airtime overestimation. Do you think this can be improved? (..I agree this is not a perfect world, but .. :)) Moreover, can this approach be affected by some interrupt coalescing implemented by the chipset? > + } > + } else { > + airtime = (pktlen * 8 * 1000 * > + IEEE80211_AIRTIME_MINRATE_RECIPROCAL) >> IEEE80211_RECIPROCAL_SHIFT; > + airtime += IEEE80211_AIRTIME_OVERHEAD; > + } > > /* > * The key can be removed while the packet was queued, so need to call > @@ -3659,6 +3680,15 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, > } > > IEEE80211_SKB_CB(skb)->control.vif = vif; > + > + if (airtime) { > + info->control.tx_time_est = airtime; > + > + spin_lock_bh(&local->active_txq_lock[ac]); > + local->airtime_queued[ac] += airtime; > + spin_unlock_bh(&local->active_txq_lock[ac]); > + } > + > return skb; > > out: > @@ -3676,6 +3706,9 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac) > > spin_lock_bh(&local->active_txq_lock[ac]); > > + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) > + goto out; > + > begin: > txqi = list_first_entry_or_null(&local->active_txqs[ac], > struct txq_info, > @@ -3753,6 +3786,9 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw, > > spin_lock_bh(&local->active_txq_lock[ac]); > > + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) > + goto out; > + > if (!txqi->txq.sta) > goto out; > >
Lorenzo Bianconi <lorenzo@kernel.org> writes: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Some devices have deep buffers in firmware and/or hardware which prevents >> the FQ structure in mac80211 from effectively limiting bufferbloat on the >> link. For Ethernet devices we have BQL to limit the lower-level queues, but >> this cannot be applied to mac80211 because transmit rates can vary wildly >> between packets depending on which station we are transmitting it to. >> >> To overcome this, we can use airtime-based queue limiting (AQL), where we >> estimate the transmission time for each packet before dequeueing it, and >> use that to limit the amount of data in-flight to the hardware. This idea >> was originally implemented as part of the out-of-tree airtime fairness >> patch to ath10k[0] in chromiumos. >> >> This patch ports that idea over to mac80211. The basic idea is simple >> enough: Whenever we dequeue a packet from the TXQs and send it to the >> driver, we estimate its airtime usage, based on the last recorded TX rate >> of the station that packet is destined for. We keep a running per-AC total >> of airtime queued for the whole device, and when that total climbs above 8 >> ms' worth of data (corresponding to two maximum-sized aggregates), we >> simply throttle the queues until it drops down again. >> >> The estimated airtime for each skb is stored in the tx_info, so we can >> subtract the same amount from the running total when the skb is freed or >> recycled. The throttling mechanism relies on this accounting to be >> accurate (i.e., that we are not freeing skbs without subtracting any >> airtime they were accounted for), so we put the subtraction into >> ieee80211_report_used_skb(). >> >> This patch does *not* include any mechanism to wake a throttled TXQ again, >> on the assumption that this will happen anyway as a side effect of whatever >> freed the skb (most commonly a TX completion). >> >> The throttling mechanism only kicks in if the queued airtime total goes >> above the limit. Since mac80211 calculates the time based on the reported >> last_tx_time from the driver, the whole throttling mechanism only kicks in >> for drivers that actually report this value. With the exception of >> multicast, where we always calculate an estimated tx time on the assumption >> that multicast is transmitted at the lowest (6 Mbps) rate. >> >> The throttling added in this patch is in addition to any throttling already >> performed by the airtime fairness mechanism, and in principle the two >> mechanisms are orthogonal (and currently also uses two different sources of >> airtime). In the future, we could amend this, using the airtime estimates >> calculated by this mechanism as a fallback input to the airtime fairness >> scheduler, to enable airtime fairness even on drivers that don't have a >> hardware source of airtime usage for each station. >> >> [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >> --- >> net/mac80211/debugfs.c | 24 ++++++++++++++++++++++++ >> net/mac80211/ieee80211_i.h | 7 +++++++ >> net/mac80211/status.c | 22 ++++++++++++++++++++++ >> net/mac80211/tx.c | 38 +++++++++++++++++++++++++++++++++++++- >> 4 files changed, 90 insertions(+), 1 deletion(-) > > Hi Toke, > > Thx a lot for working on this. Few comments inline. > > Regards, > Lorenzo > >> >> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c >> index 568b3b276931..c846c6e7f3e3 100644 >> --- a/net/mac80211/debugfs.c >> +++ b/net/mac80211/debugfs.c >> @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { >> .llseek = default_llseek, >> }; >> > > [...] > >> @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, >> tx.skb = skb; >> tx.sdata = vif_to_sdata(info->control.vif); >> >> - if (txq->sta) >> + pktlen = skb->len + 38; >> + if (txq->sta) { >> tx.sta = container_of(txq->sta, struct sta_info, sta); >> + if (tx.sta->last_tx_bitrate) { >> + airtime = (pktlen * 8 * 1000 * >> + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; >> + airtime += IEEE80211_AIRTIME_OVERHEAD; > > Here we are not taking into account aggregation burst size (it is done > in a rough way in chromeos implementation) and tx retries. I have not > carried out any tests so far but I think IEEE80211_AIRTIME_OVERHEAD > will led to a significant airtime overestimation. Do you think this > can be improved? (..I agree this is not a perfect world, but .. :)) Hmm, yeah, looking at this again, the way I'm going this now, I should probably have used the low 16-us IFS overhead for every packet. I guess we could do something similar to what the chromeos thing is doing. I.e., adding a single "large" overhead value when we think the packet is the first of a burst, and using the smaller value for the rest. One approach could be to couple the switch to the "scheduling rounds" we already have. I.e., first packet after a call to ieee8021_txq_schedule_start() will get the 100-us overhead, and every subsequent one will get the low one. Not sure how this will fit with what the driver actually does, though, so I guess some experimentation is in order. Ultimately, I'm not sure it matters that much whether occasionally add 80 us extra to the estimate. But as you say, adding 100 us to every packet is probably a bit much ;) > Moreover, can this approach be affected by some interrupt coalescing > implemented by the chipset? Probably? Ultimately we don't really know what exactly the chipset is doing, so we're guessing here, no? -Toke
> Lorenzo Bianconi <lorenzo@kernel.org> writes: > > >> From: Toke Høiland-Jørgensen <toke@redhat.com> > >> > >> Some devices have deep buffers in firmware and/or hardware which prevents > >> the FQ structure in mac80211 from effectively limiting bufferbloat on the > >> link. For Ethernet devices we have BQL to limit the lower-level queues, but > >> this cannot be applied to mac80211 because transmit rates can vary wildly > >> between packets depending on which station we are transmitting it to. > >> > >> To overcome this, we can use airtime-based queue limiting (AQL), where we > >> estimate the transmission time for each packet before dequeueing it, and > >> use that to limit the amount of data in-flight to the hardware. This idea > >> was originally implemented as part of the out-of-tree airtime fairness > >> patch to ath10k[0] in chromiumos. > >> > >> This patch ports that idea over to mac80211. The basic idea is simple > >> enough: Whenever we dequeue a packet from the TXQs and send it to the > >> driver, we estimate its airtime usage, based on the last recorded TX rate > >> of the station that packet is destined for. We keep a running per-AC total > >> of airtime queued for the whole device, and when that total climbs above 8 > >> ms' worth of data (corresponding to two maximum-sized aggregates), we > >> simply throttle the queues until it drops down again. > >> > >> The estimated airtime for each skb is stored in the tx_info, so we can > >> subtract the same amount from the running total when the skb is freed or > >> recycled. The throttling mechanism relies on this accounting to be > >> accurate (i.e., that we are not freeing skbs without subtracting any > >> airtime they were accounted for), so we put the subtraction into > >> ieee80211_report_used_skb(). > >> > >> This patch does *not* include any mechanism to wake a throttled TXQ again, > >> on the assumption that this will happen anyway as a side effect of whatever > >> freed the skb (most commonly a TX completion). > >> > >> The throttling mechanism only kicks in if the queued airtime total goes > >> above the limit. Since mac80211 calculates the time based on the reported > >> last_tx_time from the driver, the whole throttling mechanism only kicks in > >> for drivers that actually report this value. With the exception of > >> multicast, where we always calculate an estimated tx time on the assumption > >> that multicast is transmitted at the lowest (6 Mbps) rate. > >> > >> The throttling added in this patch is in addition to any throttling already > >> performed by the airtime fairness mechanism, and in principle the two > >> mechanisms are orthogonal (and currently also uses two different sources of > >> airtime). In the future, we could amend this, using the airtime estimates > >> calculated by this mechanism as a fallback input to the airtime fairness > >> scheduler, to enable airtime fairness even on drivers that don't have a > >> hardware source of airtime usage for each station. > >> > >> [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 > >> > >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > >> --- > >> net/mac80211/debugfs.c | 24 ++++++++++++++++++++++++ > >> net/mac80211/ieee80211_i.h | 7 +++++++ > >> net/mac80211/status.c | 22 ++++++++++++++++++++++ > >> net/mac80211/tx.c | 38 +++++++++++++++++++++++++++++++++++++- > >> 4 files changed, 90 insertions(+), 1 deletion(-) > > > > Hi Toke, > > > > Thx a lot for working on this. Few comments inline. > > > > Regards, > > Lorenzo > > > >> > >> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c > >> index 568b3b276931..c846c6e7f3e3 100644 > >> --- a/net/mac80211/debugfs.c > >> +++ b/net/mac80211/debugfs.c > >> @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { > >> .llseek = default_llseek, > >> }; > >> > > > > [...] > > > >> @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, > >> tx.skb = skb; > >> tx.sdata = vif_to_sdata(info->control.vif); > >> > >> - if (txq->sta) > >> + pktlen = skb->len + 38; > >> + if (txq->sta) { > >> tx.sta = container_of(txq->sta, struct sta_info, sta); > >> + if (tx.sta->last_tx_bitrate) { > >> + airtime = (pktlen * 8 * 1000 * > >> + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; > >> + airtime += IEEE80211_AIRTIME_OVERHEAD; > > > > Here we are not taking into account aggregation burst size (it is done > > in a rough way in chromeos implementation) and tx retries. I have not > > carried out any tests so far but I think IEEE80211_AIRTIME_OVERHEAD > > will led to a significant airtime overestimation. Do you think this > > can be improved? (..I agree this is not a perfect world, but .. :)) > > Hmm, yeah, looking at this again, the way I'm going this now, I should > probably have used the low 16-us IFS overhead for every packet. > > I guess we could do something similar to what the chromeos thing is > doing. I.e., adding a single "large" overhead value when we think the > packet is the first of a burst, and using the smaller value for the > rest. > > One approach could be to couple the switch to the "scheduling rounds" we > already have. I.e., first packet after a call to > ieee8021_txq_schedule_start() will get the 100-us overhead, and every > subsequent one will get the low one. Not sure how this will fit with > what the driver actually does, though, so I guess some experimentation > is in order. > > Ultimately, I'm not sure it matters that much whether occasionally add > 80 us extra to the estimate. But as you say, adding 100 us to every > packet is probably a bit much ;) Would it be possible to use the previous tx airtime reported by the driver? (not sure if it is feasible). Some drivers can report airtime compute in hw, the issue is it can be no not linked to the given skb or aggregation burst, so we should take into account burst size > > > Moreover, can this approach be affected by some interrupt coalescing > > implemented by the chipset? > > Probably? Ultimately we don't really know what exactly the chipset is > doing, so we're guessing here, no? Here I mean if the hw relies on a 1:n tx interrupt/packet ratio (I guess most driver do), it would probably affect throughput, right? (e.g TCP) Regards, Lorenzo > > -Toke >
Lorenzo Bianconi <lorenzo@kernel.org> writes: >> Lorenzo Bianconi <lorenzo@kernel.org> writes: >> >> >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> >> >> Some devices have deep buffers in firmware and/or hardware which prevents >> >> the FQ structure in mac80211 from effectively limiting bufferbloat on the >> >> link. For Ethernet devices we have BQL to limit the lower-level queues, but >> >> this cannot be applied to mac80211 because transmit rates can vary wildly >> >> between packets depending on which station we are transmitting it to. >> >> >> >> To overcome this, we can use airtime-based queue limiting (AQL), where we >> >> estimate the transmission time for each packet before dequeueing it, and >> >> use that to limit the amount of data in-flight to the hardware. This idea >> >> was originally implemented as part of the out-of-tree airtime fairness >> >> patch to ath10k[0] in chromiumos. >> >> >> >> This patch ports that idea over to mac80211. The basic idea is simple >> >> enough: Whenever we dequeue a packet from the TXQs and send it to the >> >> driver, we estimate its airtime usage, based on the last recorded TX rate >> >> of the station that packet is destined for. We keep a running per-AC total >> >> of airtime queued for the whole device, and when that total climbs above 8 >> >> ms' worth of data (corresponding to two maximum-sized aggregates), we >> >> simply throttle the queues until it drops down again. >> >> >> >> The estimated airtime for each skb is stored in the tx_info, so we can >> >> subtract the same amount from the running total when the skb is freed or >> >> recycled. The throttling mechanism relies on this accounting to be >> >> accurate (i.e., that we are not freeing skbs without subtracting any >> >> airtime they were accounted for), so we put the subtraction into >> >> ieee80211_report_used_skb(). >> >> >> >> This patch does *not* include any mechanism to wake a throttled TXQ again, >> >> on the assumption that this will happen anyway as a side effect of whatever >> >> freed the skb (most commonly a TX completion). >> >> >> >> The throttling mechanism only kicks in if the queued airtime total goes >> >> above the limit. Since mac80211 calculates the time based on the reported >> >> last_tx_time from the driver, the whole throttling mechanism only kicks in >> >> for drivers that actually report this value. With the exception of >> >> multicast, where we always calculate an estimated tx time on the assumption >> >> that multicast is transmitted at the lowest (6 Mbps) rate. >> >> >> >> The throttling added in this patch is in addition to any throttling already >> >> performed by the airtime fairness mechanism, and in principle the two >> >> mechanisms are orthogonal (and currently also uses two different sources of >> >> airtime). In the future, we could amend this, using the airtime estimates >> >> calculated by this mechanism as a fallback input to the airtime fairness >> >> scheduler, to enable airtime fairness even on drivers that don't have a >> >> hardware source of airtime usage for each station. >> >> >> >> [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 >> >> >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> >> >> --- >> >> net/mac80211/debugfs.c | 24 ++++++++++++++++++++++++ >> >> net/mac80211/ieee80211_i.h | 7 +++++++ >> >> net/mac80211/status.c | 22 ++++++++++++++++++++++ >> >> net/mac80211/tx.c | 38 +++++++++++++++++++++++++++++++++++++- >> >> 4 files changed, 90 insertions(+), 1 deletion(-) >> > >> > Hi Toke, >> > >> > Thx a lot for working on this. Few comments inline. >> > >> > Regards, >> > Lorenzo >> > >> >> >> >> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c >> >> index 568b3b276931..c846c6e7f3e3 100644 >> >> --- a/net/mac80211/debugfs.c >> >> +++ b/net/mac80211/debugfs.c >> >> @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { >> >> .llseek = default_llseek, >> >> }; >> >> >> > >> > [...] >> > >> >> @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, >> >> tx.skb = skb; >> >> tx.sdata = vif_to_sdata(info->control.vif); >> >> >> >> - if (txq->sta) >> >> + pktlen = skb->len + 38; >> >> + if (txq->sta) { >> >> tx.sta = container_of(txq->sta, struct sta_info, sta); >> >> + if (tx.sta->last_tx_bitrate) { >> >> + airtime = (pktlen * 8 * 1000 * >> >> + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; >> >> + airtime += IEEE80211_AIRTIME_OVERHEAD; >> > >> > Here we are not taking into account aggregation burst size (it is done >> > in a rough way in chromeos implementation) and tx retries. I have not >> > carried out any tests so far but I think IEEE80211_AIRTIME_OVERHEAD >> > will led to a significant airtime overestimation. Do you think this >> > can be improved? (..I agree this is not a perfect world, but .. :)) >> >> Hmm, yeah, looking at this again, the way I'm going this now, I should >> probably have used the low 16-us IFS overhead for every packet. >> >> I guess we could do something similar to what the chromeos thing is >> doing. I.e., adding a single "large" overhead value when we think the >> packet is the first of a burst, and using the smaller value for the >> rest. >> >> One approach could be to couple the switch to the "scheduling rounds" we >> already have. I.e., first packet after a call to >> ieee8021_txq_schedule_start() will get the 100-us overhead, and every >> subsequent one will get the low one. Not sure how this will fit with >> what the driver actually does, though, so I guess some experimentation >> is in order. >> >> Ultimately, I'm not sure it matters that much whether occasionally add >> 80 us extra to the estimate. But as you say, adding 100 us to every >> packet is probably a bit much ;) > > Would it be possible to use the previous tx airtime reported by the > driver? (not sure if it is feasible). Some drivers can report airtime > compute in hw, the issue is it can be no not linked to the given skb > or aggregation burst, so we should take into account burst size That's what we do for the fairness scheduler. And yeah, if the HW can report after-the-fact airtime usage that is bound to be more accurate, so I think we should keep using that for fairness. But for this AQL thing, we really need it ahead of time. However, I don't think it's as important that it is super accurate. As long as we have a reasonable estimate I think we'll be fine. We can solve any inaccuracies by fiddling with the limit, I think. Similar to what BQL does; dynamically adjusting it up and down. So for a first pass, we can just err on the side of having the limit higher, and then iterate from there. >> > Moreover, can this approach be affected by some interrupt coalescing >> > implemented by the chipset? >> >> Probably? Ultimately we don't really know what exactly the chipset is >> doing, so we're guessing here, no? > > Here I mean if the hw relies on a 1:n tx interrupt/packet ratio (I > guess most driver do), it would probably affect throughput, right? > (e.g TCP) Yeah, this is what I alluded to above: If we set the limit too low, were are going to kill TCP throughput. Ideally, we want the limit to be as low as we can get it without hurting TCP (too much), but no lower. Just doing the conversion to airtime is a way to achieve this: This will scale the actual queue length with the achievable throughput as long as the tx rate estimate is reasonably accurate. If needed, we can add another layer of dynamic tuning on top using the existing BQL logic; but I'd like to get the basic case working first... -Toke
Hi Toke, There is an updated version of AQL in the chromiumos tree implemented in the mac80211 driver, instead of in the ath10k driver as the original version: https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7 https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703106/6 It is based on a more recent kernel (4.14) and integrated with the airtime fairness tx scheduler in mac80211. This version has been tested rather extensively. I intended to use it as the basis for my effort to bring AQL upstream, but get sidetracked by other things. I can clean it up and send a patchset next week if you think that is the right path. Sorry for the long delay and slack off on the upstream effort. There is some concern in this thread regarding the accuracy of the estimated airtime using the last reported TX rate. It is indeed a rather crude method and did not include retries in the calculation. Besides, there are lags between firmware changing rate and host driver get the rate update. The 16us IFS overhead is only correct for 5G and it is actually 10us for 2.4 G. However, that hardly matters. The goal of AQL is to prevent the firmware/hardware queue from getting bloated or starved. There is a lot of headroom in the queue length limit (8-10 ms) to tolerate inaccuracy in the estimate airtime. AQL doesn't control the fine grained TX packet scheduling. It is handled by the airtime fairness scheduler and ultimately firmware. There are two TX airtimes in the newer version (chromiumos 4.14 kernel): The estimated airtime for frames pending in the queue and the airtime reported by the firmware for the frame transmitted, which should be accurate as the firmware supposed to take retries and aggregation into account. The airtime fairness scheduler that does the TX packet scheduling should use the TX airtime reported by the firmware. That's the reason why the original implementation in the ChromiumOS tree tries to factor in aggregation size when estimate the airtime overhead and the later version doesn't even bother with that. Regards, Kan On Fri, Sep 20, 2019 at 6:32 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Lorenzo Bianconi <lorenzo@kernel.org> writes: > > >> Lorenzo Bianconi <lorenzo@kernel.org> writes: > >> > >> >> From: Toke Høiland-Jørgensen <toke@redhat.com> > >> >> > >> >> Some devices have deep buffers in firmware and/or hardware which prevents > >> >> the FQ structure in mac80211 from effectively limiting bufferbloat on the > >> >> link. For Ethernet devices we have BQL to limit the lower-level queues, but > >> >> this cannot be applied to mac80211 because transmit rates can vary wildly > >> >> between packets depending on which station we are transmitting it to. > >> >> > >> >> To overcome this, we can use airtime-based queue limiting (AQL), where we > >> >> estimate the transmission time for each packet before dequeueing it, and > >> >> use that to limit the amount of data in-flight to the hardware. This idea > >> >> was originally implemented as part of the out-of-tree airtime fairness > >> >> patch to ath10k[0] in chromiumos. > >> >> > >> >> This patch ports that idea over to mac80211. The basic idea is simple > >> >> enough: Whenever we dequeue a packet from the TXQs and send it to the > >> >> driver, we estimate its airtime usage, based on the last recorded TX rate > >> >> of the station that packet is destined for. We keep a running per-AC total > >> >> of airtime queued for the whole device, and when that total climbs above 8 > >> >> ms' worth of data (corresponding to two maximum-sized aggregates), we > >> >> simply throttle the queues until it drops down again. > >> >> > >> >> The estimated airtime for each skb is stored in the tx_info, so we can > >> >> subtract the same amount from the running total when the skb is freed or > >> >> recycled. The throttling mechanism relies on this accounting to be > >> >> accurate (i.e., that we are not freeing skbs without subtracting any > >> >> airtime they were accounted for), so we put the subtraction into > >> >> ieee80211_report_used_skb(). > >> >> > >> >> This patch does *not* include any mechanism to wake a throttled TXQ again, > >> >> on the assumption that this will happen anyway as a side effect of whatever > >> >> freed the skb (most commonly a TX completion). > >> >> > >> >> The throttling mechanism only kicks in if the queued airtime total goes > >> >> above the limit. Since mac80211 calculates the time based on the reported > >> >> last_tx_time from the driver, the whole throttling mechanism only kicks in > >> >> for drivers that actually report this value. With the exception of > >> >> multicast, where we always calculate an estimated tx time on the assumption > >> >> that multicast is transmitted at the lowest (6 Mbps) rate. > >> >> > >> >> The throttling added in this patch is in addition to any throttling already > >> >> performed by the airtime fairness mechanism, and in principle the two > >> >> mechanisms are orthogonal (and currently also uses two different sources of > >> >> airtime). In the future, we could amend this, using the airtime estimates > >> >> calculated by this mechanism as a fallback input to the airtime fairness > >> >> scheduler, to enable airtime fairness even on drivers that don't have a > >> >> hardware source of airtime usage for each station. > >> >> > >> >> [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 > >> >> > >> >> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> > >> >> --- > >> >> net/mac80211/debugfs.c | 24 ++++++++++++++++++++++++ > >> >> net/mac80211/ieee80211_i.h | 7 +++++++ > >> >> net/mac80211/status.c | 22 ++++++++++++++++++++++ > >> >> net/mac80211/tx.c | 38 +++++++++++++++++++++++++++++++++++++- > >> >> 4 files changed, 90 insertions(+), 1 deletion(-) > >> > > >> > Hi Toke, > >> > > >> > Thx a lot for working on this. Few comments inline. > >> > > >> > Regards, > >> > Lorenzo > >> > > >> >> > >> >> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c > >> >> index 568b3b276931..c846c6e7f3e3 100644 > >> >> --- a/net/mac80211/debugfs.c > >> >> +++ b/net/mac80211/debugfs.c > >> >> @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { > >> >> .llseek = default_llseek, > >> >> }; > >> >> > >> > > >> > [...] > >> > > >> >> @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, > >> >> tx.skb = skb; > >> >> tx.sdata = vif_to_sdata(info->control.vif); > >> >> > >> >> - if (txq->sta) > >> >> + pktlen = skb->len + 38; > >> >> + if (txq->sta) { > >> >> tx.sta = container_of(txq->sta, struct sta_info, sta); > >> >> + if (tx.sta->last_tx_bitrate) { > >> >> + airtime = (pktlen * 8 * 1000 * > >> >> + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; > >> >> + airtime += IEEE80211_AIRTIME_OVERHEAD; > >> > > >> > Here we are not taking into account aggregation burst size (it is done > >> > in a rough way in chromeos implementation) and tx retries. I have not > >> > carried out any tests so far but I think IEEE80211_AIRTIME_OVERHEAD > >> > will led to a significant airtime overestimation. Do you think this > >> > can be improved? (..I agree this is not a perfect world, but .. :)) > >> > >> Hmm, yeah, looking at this again, the way I'm going this now, I should > >> probably have used the low 16-us IFS overhead for every packet. > >> > >> I guess we could do something similar to what the chromeos thing is > >> doing. I.e., adding a single "large" overhead value when we think the > >> packet is the first of a burst, and using the smaller value for the > >> rest. > >> > >> One approach could be to couple the switch to the "scheduling rounds" we > >> already have. I.e., first packet after a call to > >> ieee8021_txq_schedule_start() will get the 100-us overhead, and every > >> subsequent one will get the low one. Not sure how this will fit with > >> what the driver actually does, though, so I guess some experimentation > >> is in order. > >> > >> Ultimately, I'm not sure it matters that much whether occasionally add > >> 80 us extra to the estimate. But as you say, adding 100 us to every > >> packet is probably a bit much ;) > > > > Would it be possible to use the previous tx airtime reported by the > > driver? (not sure if it is feasible). Some drivers can report airtime > > compute in hw, the issue is it can be no not linked to the given skb > > or aggregation burst, so we should take into account burst size > > That's what we do for the fairness scheduler. And yeah, if the HW can > report after-the-fact airtime usage that is bound to be more accurate, > so I think we should keep using that for fairness. > > But for this AQL thing, we really need it ahead of time. However, I > don't think it's as important that it is super accurate. As long as we > have a reasonable estimate I think we'll be fine. We can solve any > inaccuracies by fiddling with the limit, I think. Similar to what BQL > does; dynamically adjusting it up and down. > > So for a first pass, we can just err on the side of having the limit > higher, and then iterate from there. > > >> > Moreover, can this approach be affected by some interrupt coalescing > >> > implemented by the chipset? > >> > >> Probably? Ultimately we don't really know what exactly the chipset is > >> doing, so we're guessing here, no? > > > > Here I mean if the hw relies on a 1:n tx interrupt/packet ratio (I > > guess most driver do), it would probably affect throughput, right? > > (e.g TCP) > > Yeah, this is what I alluded to above: If we set the limit too low, were > are going to kill TCP throughput. Ideally, we want the limit to be as > low as we can get it without hurting TCP (too much), but no lower. Just > doing the conversion to airtime is a way to achieve this: This will > scale the actual queue length with the achievable throughput as long as > the tx rate estimate is reasonably accurate. If needed, we can add > another layer of dynamic tuning on top using the existing BQL logic; but > I'd like to get the basic case working first... > > -Toke >
Kan Yan <kyan@google.com> writes: > Hi Toke, > > There is an updated version of AQL in the chromiumos tree implemented > in the mac80211 driver, instead of in the ath10k driver as the > original version: > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703105/7 > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1703106/6 Ah, that's awesome! Thank you for brining this up :) > It is based on a more recent kernel (4.14) and integrated with the > airtime fairness tx scheduler in mac80211. This version has been > tested rather extensively. I intended to use it as the basis for my > effort to bring AQL upstream, but get sidetracked by other things. I > can clean it up and send a patchset next week if you think that is the > right path. Yes, please do! AFAICT, the main difference is that your version keeps the airtime calculation itself in the driver, while mine passes up the rate and lets mac80211 do the calculation of airtime. Other than that, the differences are minor, no? I'm not actually sure which approach is best; I suspect doing all the accounting in mac80211 will help with integrating this into drivers that use minstrel; we can just add a hook in that and be done with it. Whereas if the driver has to do the accounting, we would need to add that to each driver (mt76, iwl(?)). But of course, doing things in mac80211 depends on stuffing even more stuff into the already overloaded cb field; and I'm not actually entirely sure what I've done with that will actually work. WDYT? In any case, if you post your series we'll have something to contrast against, which I think will be useful to help us converge on something we can all be happy with. Of course we'll also have to eventually integrate this with the other series that Yibo recently re-posted (the virtual time scheduler). I think that will be relatively straight forward, except I'm not sure your atomic patches will work when we also have to update the rbtree. Any thoughts on that series in general? > Sorry for the long delay and slack off on the upstream effort. Hehe, no worries. I only posted this because Dave finally bugged me into doing something about this at LPC. And hey, we're making progress now, so that's good! :) > There is some concern in this thread regarding the accuracy of the > estimated airtime using the last reported TX rate. It is indeed a > rather crude method and did not include retries in the calculation. > Besides, there are lags between firmware changing rate and host driver > get the rate update. The 16us IFS overhead is only correct for 5G and > it is actually 10us for 2.4 G. However, that hardly matters. The goal > of AQL is to prevent the firmware/hardware queue from getting bloated > or starved. There is a lot of headroom in the queue length limit (8-10 > ms) to tolerate inaccuracy in the estimate airtime. AQL doesn't > control the fine grained TX packet scheduling. It is handled by the > airtime fairness scheduler and ultimately firmware. Yeah, this was basically the point I was trying to make; this limit doesn't need to be that accurate, we just need a rough estimate. If we want to get the latency even lower later, we're better off fiddling with the queue limit value than trying to improve the airtime estimate. > There are two TX airtimes in the newer version (chromiumos 4.14 > kernel): The estimated airtime for frames pending in the queue and the > airtime reported by the firmware for the frame transmitted, which > should be accurate as the firmware supposed to take retries and > aggregation into account. The airtime fairness scheduler that does the > TX packet scheduling should use the TX airtime reported by the > firmware. That's the reason why the original implementation in the > ChromiumOS tree tries to factor in aggregation size when estimate the > airtime overhead and the later version doesn't even bother with that. Yup, makes sense. Looking at the version you linked to, though, it seems you're calling ieee80211_sta_register_airtime() with the estimated value as well? So are you double-accounting airtime, or are you adjusting for the accurate values somewhere else I don't see in that series? -Toke
On 2019-09-19 20:22, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Some devices have deep buffers in firmware and/or hardware which > prevents > the FQ structure in mac80211 from effectively limiting bufferbloat on > the > link. For Ethernet devices we have BQL to limit the lower-level queues, > but > this cannot be applied to mac80211 because transmit rates can vary > wildly > between packets depending on which station we are transmitting it to. > > To overcome this, we can use airtime-based queue limiting (AQL), where > we > estimate the transmission time for each packet before dequeueing it, > and > use that to limit the amount of data in-flight to the hardware. This > idea > was originally implemented as part of the out-of-tree airtime fairness > patch to ath10k[0] in chromiumos. > > This patch ports that idea over to mac80211. The basic idea is simple > enough: Whenever we dequeue a packet from the TXQs and send it to the > driver, we estimate its airtime usage, based on the last recorded TX > rate > of the station that packet is destined for. We keep a running per-AC > total > of airtime queued for the whole device, and when that total climbs > above 8 > ms' worth of data (corresponding to two maximum-sized aggregates), we > simply throttle the queues until it drops down again. > > The estimated airtime for each skb is stored in the tx_info, so we can > subtract the same amount from the running total when the skb is freed > or > recycled. The throttling mechanism relies on this accounting to be > accurate (i.e., that we are not freeing skbs without subtracting any > airtime they were accounted for), so we put the subtraction into > ieee80211_report_used_skb(). > > This patch does *not* include any mechanism to wake a throttled TXQ > again, > on the assumption that this will happen anyway as a side effect of > whatever > freed the skb (most commonly a TX completion). > > The throttling mechanism only kicks in if the queued airtime total goes > above the limit. Since mac80211 calculates the time based on the > reported > last_tx_time from the driver, the whole throttling mechanism only kicks > in > for drivers that actually report this value. With the exception of > multicast, where we always calculate an estimated tx time on the > assumption > that multicast is transmitted at the lowest (6 Mbps) rate. > > The throttling added in this patch is in addition to any throttling > already > performed by the airtime fairness mechanism, and in principle the two > mechanisms are orthogonal (and currently also uses two different > sources of > airtime). In the future, we could amend this, using the airtime > estimates > calculated by this mechanism as a fallback input to the airtime > fairness > scheduler, to enable airtime fairness even on drivers that don't have a > hardware source of airtime usage for each station. > So if it is going to work together with virtual time based mechanism in the future, the Tx criteria will be met both of below conditions, 1. Lower than g_vt 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT Are we going to maintain two kinds of airtime that one is from estimation and the other is basically from FW reporting? Meanwhile, airtime_queued will also limit the situation that we only have a station for transmission. Not sure if the peak throughput will be impacted. I believe it may work fine with 11ac in chromiumos, how about 11n and 11a? Anyway, I think this approach will help to improve performance of the virtual time based mechanism since it makes packets buffered in host instead of FW's deep queue. We have observed 2-clients case with different ratio in TCP fails to maintain the ratio because the packets arriving at host get pushed to FW immediately and thus the airtime weight sum is 0 in most of time meaning no TXQ in the rbtree. For UDP, if we pump load more than the PHY rate, the ratio can be maintained beacuse the FW queue is full and packtes begin to be buffered in host making TXQs staying on the rbtree for most of time. However, TCP has its own flow control that we can not push enough load like UDP. > out: > @@ -3676,6 +3706,9 @@ struct ieee80211_txq *ieee80211_next_txq(struct > ieee80211_hw *hw, u8 ac) > > spin_lock_bh(&local->active_txq_lock[ac]); > > + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) > + goto out; > + > begin: > txqi = list_first_entry_or_null(&local->active_txqs[ac], > struct txq_info, > @@ -3753,6 +3786,9 @@ bool ieee80211_txq_may_transmit(struct > ieee80211_hw *hw, > > spin_lock_bh(&local->active_txq_lock[ac]); > > + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) > + goto out; > + > if (!txqi->txq.sta) > goto out;
Yibo Zhao <yiboz@codeaurora.org> writes: > So if it is going to work together with virtual time based mechanism in > the future, the Tx criteria will be met both of below conditions, > 1. Lower than g_vt > 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > Are we going to maintain two kinds of airtime that one is from > estimation and the other is basically from FW reporting? Yes, that was my plan. For devices that don't have FW reporting of airtime, we can fall back to the estimation; but if we do have FW reporting that is most likely going to be more accurate, so better to use that for fairness... > Meanwhile, airtime_queued will also limit the situation that we only > have a station for transmission. Not sure if the peak throughput will > be impacted. I believe it may work fine with 11ac in chromiumos, how > about 11n and 11a? Well, we will need to test that, of course. But ath9k shows that it's quite possible to run with quite shallow buffers, so with a bit of tuning I think we should be fine. If anything, slower networks need *fewer* packets queued in the firmware, and it's *easier* for the host to keep up with transmission. > Anyway, I think this approach will help to improve performance of the > virtual time based mechanism since it makes packets buffered in host > instead of FW's deep queue. We have observed 2-clients case with > different ratio in TCP fails to maintain the ratio because the packets > arriving at host get pushed to FW immediately and thus the airtime > weight sum is 0 in most of time meaning no TXQ in the rbtree. For UDP, > if we pump load more than the PHY rate, the ratio can be maintained > beacuse the FW queue is full and packtes begin to be buffered in host > making TXQs staying on the rbtree for most of time. However, TCP has its > own flow control that we can not push enough load like UDP. Yes, fixing that is exactly the point of this series :) -Toke
On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: > Yibo Zhao <yiboz@codeaurora.org> writes: > >> So if it is going to work together with virtual time based mechanism >> in >> the future, the Tx criteria will be met both of below conditions, >> 1. Lower than g_vt >> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > >> Are we going to maintain two kinds of airtime that one is from >> estimation and the other is basically from FW reporting? > > Yes, that was my plan. For devices that don't have FW reporting of > airtime, we can fall back to the estimation; but if we do have FW > reporting that is most likely going to be more accurate, so better to > use that for fairness... Do you mean we will use airtime reported by FW to calculate local->airtime_queued in case we have FW reporting airtime? > >> Meanwhile, airtime_queued will also limit the situation that we only >> have a station for transmission. Not sure if the peak throughput will >> be impacted. I believe it may work fine with 11ac in chromiumos, how >> about 11n and 11a? > > Well, we will need to test that, of course. But ath9k shows that it's > quite possible to run with quite shallow buffers, so with a bit of > tuning I think we should be fine. If anything, slower networks need > *fewer* packets queued in the firmware, and it's *easier* for the host > to keep up with transmission. > >> Anyway, I think this approach will help to improve performance of the >> virtual time based mechanism since it makes packets buffered in host >> instead of FW's deep queue. We have observed 2-clients case with >> different ratio in TCP fails to maintain the ratio because the packets >> arriving at host get pushed to FW immediately and thus the airtime >> weight sum is 0 in most of time meaning no TXQ in the rbtree. For UDP, >> if we pump load more than the PHY rate, the ratio can be maintained >> beacuse the FW queue is full and packtes begin to be buffered in host >> making TXQs staying on the rbtree for most of time. However, TCP has >> its >> own flow control that we can not push enough load like UDP. > > Yes, fixing that is exactly the point of this series :) > > -Toke
Yibo Zhao <yiboz@codeaurora.org> writes: > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: >> Yibo Zhao <yiboz@codeaurora.org> writes: >> >>> So if it is going to work together with virtual time based mechanism >>> in >>> the future, the Tx criteria will be met both of below conditions, >>> 1. Lower than g_vt >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT >> >>> Are we going to maintain two kinds of airtime that one is from >>> estimation and the other is basically from FW reporting? >> >> Yes, that was my plan. For devices that don't have FW reporting of >> airtime, we can fall back to the estimation; but if we do have FW >> reporting that is most likely going to be more accurate, so better to >> use that for fairness... > > Do you mean we will use airtime reported by FW to calculate > local->airtime_queued in case we have FW reporting airtime? No, the opposite; if the firmware can't report airtime, we can use the estimated values to feed into report_airtime() for the fairness calculation... -Toke
> Yes, please do! AFAICT, the main difference is that your version keeps > the airtime calculation itself in the driver, while mine passes up the > rate and lets mac80211 do the calculation of airtime. Other than that, > the differences are minor, no? > I'm not actually sure which approach is best; I suspect doing all the > accounting in mac80211 will help with integrating this into drivers that > use minstrel; we can just add a hook in that and be done with it. > Whereas if the driver has to do the accounting, we would need to add > that to each driver (mt76, iwl(?)). Yes, they are essentially doing the same thing. I kept the airtime estimation code in the ath10k just because it is already there. It is better to do that in mac80211, so it doesn't have to be duplicated for each driver and avoids the overhead of updating the estimated airtime from host driver to mac80211. > But of course, doing things in mac80211 depends on stuffing even more > stuff into the already overloaded cb field; and I'm not actually > entirely sure what I've done with that will actually work. WDYT? Either way a field in skb cb is needed to record the estimated airtime. The 'tx_time_est' shares the space with the codel 'enque_time' looks fine to me, as their lifetime doesn't overlap. There is another minor difference in the ChromiumOs version, which actually address the issue Yibo just asked: > Meanwhile, airtime_queued will also limit the situation that we only > have a station for transmission. Not sure if the peak throughput will be > impacted. I believe it may work fine with 11ac in chromiumos, how about > 11n and 11a? My version has two AQL limits, a smaller per station limit (4ms) and a larger per interface limit (24 ms). When the per interface limit has not been reached, stations are allowed to transmit up to 1/3 of the interface limits (8ms). This way it balance the needs to control latency when there are a lot of stations and to get good throughput benchmark numbers with a single client. In my test, I found increasing the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4 ath10k chipset. https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734 > Of course we'll also have to eventually integrate this with the other > series that Yibo recently re-posted (the virtual time scheduler). I > think that will be relatively straight forward, except I'm not sure your > atomic patches will work when we also have to update the rbtree. Any > thoughts on that series in general? I do like the virtual time scheduler patchset. It makes it easier to schedule an arbitrary tx queue and handles ath10k's firmware pulling mode better. I will give it a try. > Yup, makes sense. Looking at the version you linked to, though, it seems > you're calling ieee80211_sta_register_airtime() with the estimated value > as well? So are you double-accounting airtime, or are you adjusting for > the accurate values somewhere else I don't see in that series? It does not double count airtime, just both the airtime fairness scheduler and AQL use the estimate airtime. It is on an older tree and still doesn't have the patch that provides the fw airtime: https://patchwork.kernel.org/patch/10684689 On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Yibo Zhao <yiboz@codeaurora.org> writes: > > > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: > >> Yibo Zhao <yiboz@codeaurora.org> writes: > >> > >>> So if it is going to work together with virtual time based mechanism > >>> in > >>> the future, the Tx criteria will be met both of below conditions, > >>> 1. Lower than g_vt > >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > >> > >>> Are we going to maintain two kinds of airtime that one is from > >>> estimation and the other is basically from FW reporting? > >> > >> Yes, that was my plan. For devices that don't have FW reporting of > >> airtime, we can fall back to the estimation; but if we do have FW > >> reporting that is most likely going to be more accurate, so better to > >> use that for fairness... > > > > Do you mean we will use airtime reported by FW to calculate > > local->airtime_queued in case we have FW reporting airtime? > > No, the opposite; if the firmware can't report airtime, we can use the > estimated values to feed into report_airtime() for the fairness > calculation... > > -Toke >
> > Do you mean we will use airtime reported by FW to calculate > > > local->airtime_queued in case we have FW reporting airtime? > No, the opposite; if the firmware can't report airtime, we can use the > estimated values to feed into report_airtime() for the fairness > calculation... The local->airtime_queued is the 'future' airtime for the packet pending the queue. It can't be replaced by the after the fact airtime reported from firmware for the frames transmitted. On Wed, Sep 25, 2019 at 5:27 PM Kan Yan <kyan@google.com> wrote: > > > Yes, please do! AFAICT, the main difference is that your version keeps > > the airtime calculation itself in the driver, while mine passes up the > > rate and lets mac80211 do the calculation of airtime. Other than that, > > the differences are minor, no? > > I'm not actually sure which approach is best; I suspect doing all the > > accounting in mac80211 will help with integrating this into drivers that > > use minstrel; we can just add a hook in that and be done with it. > > Whereas if the driver has to do the accounting, we would need to add > > that to each driver (mt76, iwl(?)). > > Yes, they are essentially doing the same thing. I kept the airtime > estimation code in the ath10k just because it is already there. It is > better to do that in mac80211, so it doesn't have to be duplicated for > each driver and avoids the overhead of updating the estimated airtime > from host driver to mac80211. > > > But of course, doing things in mac80211 depends on stuffing even more > > stuff into the already overloaded cb field; and I'm not actually > > entirely sure what I've done with that will actually work. WDYT? > Either way a field in skb cb is needed to record the estimated > airtime. The 'tx_time_est' shares the space with the codel > 'enque_time' looks fine to me, as their lifetime doesn't overlap. > > There is another minor difference in the ChromiumOs version, which > actually address the issue Yibo just asked: > > Meanwhile, airtime_queued will also limit the situation that we only > > have a station for transmission. Not sure if the peak throughput will be > > impacted. I believe it may work fine with 11ac in chromiumos, how about > > 11n and 11a? > > My version has two AQL limits, a smaller per station limit (4ms) and a > larger per interface limit (24 ms). When the per interface limit has > not been reached, stations are allowed to transmit up to 1/3 of the > interface limits (8ms). This way it balance the needs to control > latency when there are a lot of stations and to get good throughput > benchmark numbers with a single client. In my test, I found increasing > the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4 > ath10k chipset. > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734 > > > Of course we'll also have to eventually integrate this with the other > > series that Yibo recently re-posted (the virtual time scheduler). I > > think that will be relatively straight forward, except I'm not sure your > > atomic patches will work when we also have to update the rbtree. Any > > thoughts on that series in general? > I do like the virtual time scheduler patchset. It makes it easier to > schedule an arbitrary tx queue and handles ath10k's firmware pulling > mode better. I will give it a try. > > > Yup, makes sense. Looking at the version you linked to, though, it seems > > you're calling ieee80211_sta_register_airtime() with the estimated value > > as well? So are you double-accounting airtime, or are you adjusting for > > the accurate values somewhere else I don't see in that series? > It does not double count airtime, just both the airtime fairness > scheduler and AQL use the estimate airtime. It is on an older tree and > still doesn't have the patch that provides the fw airtime: > https://patchwork.kernel.org/patch/10684689 > > > On Wed, Sep 25, 2019 at 5:52 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > > > Yibo Zhao <yiboz@codeaurora.org> writes: > > > > > On 2019-09-25 16:11, Toke Høiland-Jørgensen wrote: > > >> Yibo Zhao <yiboz@codeaurora.org> writes: > > >> > > >>> So if it is going to work together with virtual time based mechanism > > >>> in > > >>> the future, the Tx criteria will be met both of below conditions, > > >>> 1. Lower than g_vt > > >>> 2. Lower than IEEE80211_AIRTIME_QUEUE_LIMIT > > >> > > >>> Are we going to maintain two kinds of airtime that one is from > > >>> estimation and the other is basically from FW reporting? > > >> > > >> Yes, that was my plan. For devices that don't have FW reporting of > > >> airtime, we can fall back to the estimation; but if we do have FW > > >> reporting that is most likely going to be more accurate, so better to > > >> use that for fairness... > > > > > > Do you mean we will use airtime reported by FW to calculate > > > local->airtime_queued in case we have FW reporting airtime? > > > > No, the opposite; if the firmware can't report airtime, we can use the > > estimated values to feed into report_airtime() for the fairness > > calculation... > > > > -Toke > >
Kan Yan <kyan@google.com> writes: >> > Do you mean we will use airtime reported by FW to calculate >> >> > local->airtime_queued in case we have FW reporting airtime? >> No, the opposite; if the firmware can't report airtime, we can use the >> estimated values to feed into report_airtime() for the fairness >> calculation... > The local->airtime_queued is the 'future' airtime for the packet > pending the queue. It can't be replaced by the after the fact airtime > reported from firmware for the frames transmitted. No, but on tx_completion we could do something like this: airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; ieee80211_report_airtime(sta, airtime); That way, if the driver sets the tx_time field to something the firmware reports, we'll use that, and otherwise we'd fall back to the estimate. Of course, there would need to be a way for the driver to opt out of this, for drivers that report out of band airtime like ath10k does :) -Toke
Kan Yan <kyan@google.com> writes: >> Yes, please do! AFAICT, the main difference is that your version keeps >> the airtime calculation itself in the driver, while mine passes up the >> rate and lets mac80211 do the calculation of airtime. Other than that, >> the differences are minor, no? >> I'm not actually sure which approach is best; I suspect doing all the >> accounting in mac80211 will help with integrating this into drivers that >> use minstrel; we can just add a hook in that and be done with it. >> Whereas if the driver has to do the accounting, we would need to add >> that to each driver (mt76, iwl(?)). > > Yes, they are essentially doing the same thing. I kept the airtime > estimation code in the ath10k just because it is already there. It is > better to do that in mac80211, so it doesn't have to be duplicated for > each driver and avoids the overhead of updating the estimated airtime > from host driver to mac80211. Right, makes sense. >> But of course, doing things in mac80211 depends on stuffing even more >> stuff into the already overloaded cb field; and I'm not actually >> entirely sure what I've done with that will actually work. WDYT? > Either way a field in skb cb is needed to record the estimated > airtime. The 'tx_time_est' shares the space with the codel > 'enque_time' looks fine to me, as their lifetime doesn't overlap. The kbuild bot pointed out that the current implementation doesn't work as it's supposed to on m68k (which is big-endian, I think?). I guess it's because the compiler puts the u16 in the "wrong half" of the space being used by the u32 it shares with, so it doesn't line up? If so, that may mean we'll need another layer struct/union wrapping; unless someone else has an idea for how to force the compiler to put the u16 in a union at the "start" of the u32 regardless of endianness? > There is another minor difference in the ChromiumOs version, which > actually address the issue Yibo just asked: >> Meanwhile, airtime_queued will also limit the situation that we only >> have a station for transmission. Not sure if the peak throughput will be >> impacted. I believe it may work fine with 11ac in chromiumos, how about >> 11n and 11a? > > My version has two AQL limits, a smaller per station limit (4ms) and a > larger per interface limit (24 ms). When the per interface limit has > not been reached, stations are allowed to transmit up to 1/3 of the > interface limits (8ms). This way it balance the needs to control > latency when there are a lot of stations and to get good throughput > benchmark numbers with a single client. In my test, I found increasing > the AQL limit to beyond 8 ms doesn't helps peak throughput on 4x4 > ath10k chipset. > https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/1734867/3/net/mac80211/tx.c#b3734 Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? >> Of course we'll also have to eventually integrate this with the other >> series that Yibo recently re-posted (the virtual time scheduler). I >> think that will be relatively straight forward, except I'm not sure your >> atomic patches will work when we also have to update the rbtree. Any >> thoughts on that series in general? > I do like the virtual time scheduler patchset. It makes it easier to > schedule an arbitrary tx queue and handles ath10k's firmware pulling > mode better. I will give it a try. Yup, that was the idea. Note that the current version doesn't have the more granular locking that Felix put in for the RR-based scheduler. Guess I need to re-spin; will see if I can't get to that soon. >> Yup, makes sense. Looking at the version you linked to, though, it seems >> you're calling ieee80211_sta_register_airtime() with the estimated value >> as well? So are you double-accounting airtime, or are you adjusting for >> the accurate values somewhere else I don't see in that series? > It does not double count airtime, just both the airtime fairness > scheduler and AQL use the estimate airtime. It is on an older tree and > still doesn't have the patch that provides the fw airtime: > https://patchwork.kernel.org/patch/10684689 Ah, I see. I assumed that the other call to sta_register_airtime() was still there... -Toke
On 2019-09-19 14:22, Toke Høiland-Jørgensen wrote: > From: Toke Høiland-Jørgensen <toke@redhat.com> > > Some devices have deep buffers in firmware and/or hardware which prevents > the FQ structure in mac80211 from effectively limiting bufferbloat on the > link. For Ethernet devices we have BQL to limit the lower-level queues, but > this cannot be applied to mac80211 because transmit rates can vary wildly > between packets depending on which station we are transmitting it to. > > To overcome this, we can use airtime-based queue limiting (AQL), where we > estimate the transmission time for each packet before dequeueing it, and > use that to limit the amount of data in-flight to the hardware. This idea > was originally implemented as part of the out-of-tree airtime fairness > patch to ath10k[0] in chromiumos. > > This patch ports that idea over to mac80211. The basic idea is simple > enough: Whenever we dequeue a packet from the TXQs and send it to the > driver, we estimate its airtime usage, based on the last recorded TX rate > of the station that packet is destined for. We keep a running per-AC total > of airtime queued for the whole device, and when that total climbs above 8 > ms' worth of data (corresponding to two maximum-sized aggregates), we > simply throttle the queues until it drops down again. > > The estimated airtime for each skb is stored in the tx_info, so we can > subtract the same amount from the running total when the skb is freed or > recycled. The throttling mechanism relies on this accounting to be > accurate (i.e., that we are not freeing skbs without subtracting any > airtime they were accounted for), so we put the subtraction into > ieee80211_report_used_skb(). > > This patch does *not* include any mechanism to wake a throttled TXQ again, > on the assumption that this will happen anyway as a side effect of whatever > freed the skb (most commonly a TX completion). > > The throttling mechanism only kicks in if the queued airtime total goes > above the limit. Since mac80211 calculates the time based on the reported > last_tx_time from the driver, the whole throttling mechanism only kicks in > for drivers that actually report this value. With the exception of > multicast, where we always calculate an estimated tx time on the assumption > that multicast is transmitted at the lowest (6 Mbps) rate. > > The throttling added in this patch is in addition to any throttling already > performed by the airtime fairness mechanism, and in principle the two > mechanisms are orthogonal (and currently also uses two different sources of > airtime). In the future, we could amend this, using the airtime estimates > calculated by this mechanism as a fallback input to the airtime fairness > scheduler, to enable airtime fairness even on drivers that don't have a > hardware source of airtime usage for each station. > > [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 One thing that might be missing here is dealing with airtime accounting of frames that remain queued in the driver/hardware because the station is in powersave mode. - Felix
Felix Fietkau <nbd@nbd.name> writes: > On 2019-09-19 14:22, Toke Høiland-Jørgensen wrote: >> From: Toke Høiland-Jørgensen <toke@redhat.com> >> >> Some devices have deep buffers in firmware and/or hardware which prevents >> the FQ structure in mac80211 from effectively limiting bufferbloat on the >> link. For Ethernet devices we have BQL to limit the lower-level queues, but >> this cannot be applied to mac80211 because transmit rates can vary wildly >> between packets depending on which station we are transmitting it to. >> >> To overcome this, we can use airtime-based queue limiting (AQL), where we >> estimate the transmission time for each packet before dequeueing it, and >> use that to limit the amount of data in-flight to the hardware. This idea >> was originally implemented as part of the out-of-tree airtime fairness >> patch to ath10k[0] in chromiumos. >> >> This patch ports that idea over to mac80211. The basic idea is simple >> enough: Whenever we dequeue a packet from the TXQs and send it to the >> driver, we estimate its airtime usage, based on the last recorded TX rate >> of the station that packet is destined for. We keep a running per-AC total >> of airtime queued for the whole device, and when that total climbs above 8 >> ms' worth of data (corresponding to two maximum-sized aggregates), we >> simply throttle the queues until it drops down again. >> >> The estimated airtime for each skb is stored in the tx_info, so we can >> subtract the same amount from the running total when the skb is freed or >> recycled. The throttling mechanism relies on this accounting to be >> accurate (i.e., that we are not freeing skbs without subtracting any >> airtime they were accounted for), so we put the subtraction into >> ieee80211_report_used_skb(). >> >> This patch does *not* include any mechanism to wake a throttled TXQ again, >> on the assumption that this will happen anyway as a side effect of whatever >> freed the skb (most commonly a TX completion). >> >> The throttling mechanism only kicks in if the queued airtime total goes >> above the limit. Since mac80211 calculates the time based on the reported >> last_tx_time from the driver, the whole throttling mechanism only kicks in >> for drivers that actually report this value. With the exception of >> multicast, where we always calculate an estimated tx time on the assumption >> that multicast is transmitted at the lowest (6 Mbps) rate. >> >> The throttling added in this patch is in addition to any throttling already >> performed by the airtime fairness mechanism, and in principle the two >> mechanisms are orthogonal (and currently also uses two different sources of >> airtime). In the future, we could amend this, using the airtime estimates >> calculated by this mechanism as a fallback input to the airtime fairness >> scheduler, to enable airtime fairness even on drivers that don't have a >> hardware source of airtime usage for each station. >> >> [0] https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/588190/13/drivers/net/wireless-4.2/ath/ath10k/mac.c#3845 > One thing that might be missing here is dealing with airtime accounting > of frames that remain queued in the driver/hardware because the station > is in powersave mode. Oh, right. Didn't know that could happen (I assumed those would be flushed out or something). But if we're going to go with Kan's suggestion of having per-station accounting as well as a global accounting for the device, we could just subtract the station's outstanding balance from the device total when it goes into powersave mode, and add it back once it wakes up again. At least I think that would work, no? -Toke
On 2019-09-26 15:17, Toke Høiland-Jørgensen wrote: > Oh, right. Didn't know that could happen (I assumed those would be > flushed out or something). But if we're going to go with Kan's > suggestion of having per-station accounting as well as a global > accounting for the device, we could just subtract the station's > outstanding balance from the device total when it goes into powersave > mode, and add it back once it wakes up again. At least I think that > would work, no?Yes, I think that would work. - Felix
Felix Fietkau <nbd@nbd.name> writes: > On 2019-09-26 15:17, Toke Høiland-Jørgensen wrote: >> Oh, right. Didn't know that could happen (I assumed those would be >> flushed out or something). But if we're going to go with Kan's >> suggestion of having per-station accounting as well as a global >> accounting for the device, we could just subtract the station's >> outstanding balance from the device total when it goes into powersave >> mode, and add it back once it wakes up again. At least I think that >> would work, no? >Yes, I think that would work. Great! Will incorporate that, then. -Toke
> No, but on tx_completion we could do something like this: > airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; > ieee80211_report_airtime(sta, airtime); > That way, if the driver sets the tx_time field to something the firmware > reports, we'll use that, and otherwise we'd fall back to the estimate. > Of course, there would need to be a way for the driver to opt out of > this, for drivers that report out of band airtime like ath10k does :) I doubt that will work, unless firmware can do per frame airtime update in the skb. It is more likely that firmware provides out of band airtime update for a period of time, like an aggregation. That's the case for ath10k: https://patchwork.kernel.org/patch/10684689 Regarding handling frame for station enters power saving mode: > > >> Oh, right. Didn't know that could happen (I assumed those would be > >> flushed out or something). But if we're going to go with Kan's > >> suggestion of having per-station accounting as well as a global > >> accounting for the device, we could just subtract the station's > >> outstanding balance from the device total when it goes into powersave > >> mode, and add it back once it wakes up again. At least I think that > >> would work, no? > >Yes, I think that would work. > Great! Will incorporate that, then. I think that could work but things could be a bit more complicated. Not sure I fully understand the usage of airtime_weight_sum in [PATCH V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler: in ieee80211_schedule_txq(): local->airtime_weight_sum[ac] += sta->airtime_weight; in ieee80211_sta_register_airtime(): weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight; local->airtime_v_t[ac] += airtime / weight_sum; sta->airtime[ac].v_t += airtime / sta->airtime_weight; in __ieee80211_unschedule_txq local->airtime_weight_sum[ac] -= sta->airtime_weight; I assume the purpose of airtime_weight_sum is to count station's virtual airtime proportional to the number of active stations for fairness. My concern is the per interface local->airtime_weight_sum[ac] get updated when packets are released from a txq to lower layer, but it doesn't mean the airtime will be consumed (packets get transmitted) shortly, due to events like station enter power save mode, so the weight_sum used in ieee80211_sta_register_airtime() maybe inaccurate. For architects using firmware/hardware offloading, firmware ultimately controls packet scheduling and has quite a lot of autonomy. The host driver's airtime_weight_sum may get out of sync with the number of active stations actually scheduled by firmware even without power saving events. Is this a correct understanding? I kind of think the original method of airtime accounting using deficit maybe more robust in this regard.
Kan Yan <kyan@google.com> writes: >> No, but on tx_completion we could do something like this: >> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; >> ieee80211_report_airtime(sta, airtime); >> That way, if the driver sets the tx_time field to something the firmware >> reports, we'll use that, and otherwise we'd fall back to the estimate. >> Of course, there would need to be a way for the driver to opt out of >> this, for drivers that report out of band airtime like ath10k does :) > > I doubt that will work, unless firmware can do per frame airtime > update in the skb. It is more likely that firmware provides out of > band airtime update for a period of time, like an aggregation. That's > the case for ath10k: https://patchwork.kernel.org/patch/10684689 No, ath10k would continue to do what it was always doing. Drivers that can report after-the-fact airtime usage per-frame (like ath9k) will continue to do that. In both of those cases, the airtime estimate is only used to throttle the queue, not to schedule for fairness. My point is just that for drivers that have *no* mechanism to report airtime usage after-the-fact, we can add a flag that will allow the values estimated by mac80211 to also be used for the fairness scheduler... > Regarding handling frame for station enters power saving mode: >> >> >> Oh, right. Didn't know that could happen (I assumed those would be >> >> flushed out or something). But if we're going to go with Kan's >> >> suggestion of having per-station accounting as well as a global >> >> accounting for the device, we could just subtract the station's >> >> outstanding balance from the device total when it goes into powersave >> >> mode, and add it back once it wakes up again. At least I think that >> >> would work, no? >> >Yes, I think that would work. >> Great! Will incorporate that, then. > > I think that could work but things could be a bit more complicated. > Not sure I fully understand the usage of airtime_weight_sum in [PATCH > V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler: > > in ieee80211_schedule_txq(): > local->airtime_weight_sum[ac] += sta->airtime_weight; > > in ieee80211_sta_register_airtime(): > weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight; > local->airtime_v_t[ac] += airtime / weight_sum; > sta->airtime[ac].v_t += airtime / sta->airtime_weight; > > in __ieee80211_unschedule_txq > local->airtime_weight_sum[ac] -= sta->airtime_weight; > > I assume the purpose of airtime_weight_sum is to count station's > virtual airtime proportional to the number of active stations for > fairness. Yup, the proportion between the station's airtime weight and the total scheduled airtime weight indicates the station's fair share. > My concern is the per interface local->airtime_weight_sum[ac] get > updated when packets are released from a txq to lower layer, but it > doesn't mean the airtime will be consumed (packets get transmitted) > shortly, due to events like station enter power save mode, so the > weight_sum used in ieee80211_sta_register_airtime() maybe inaccurate. > For architects using firmware/hardware offloading, firmware ultimately > controls packet scheduling and has quite a lot of autonomy. The host > driver's airtime_weight_sum may get out of sync with the number of > active stations actually scheduled by firmware even without power > saving events. > > Is this a correct understanding? I kind of think the original method > of airtime accounting using deficit maybe more robust in this regard. You are right that this could happen, yes. However, the station is only unscheduled when its mac80211 queue runs completely empty. So the assumption is that stations that transmit continuously (which are really the ones we care about for fairness purposes), would keep being scheduled most of the time. Now, you're quite right that this assumption might be wrong, which would lead to bad results. I think the other (queue throttling) patch set would help, though; that should push the queues up into mac80211 and give the stations a higher probability of being scheduled when they are in fact backlogged. I've only tested the virtual time scheduler on ath9k, which inherently has shallow buffers in the hardware. So yeah, it may be that the virtual time-thing turns out to not work well. But the results looked encouraging on ath9k, and since it will make it easier to schedule multiple stations, I think it has some merit that makes it worth trying out. We should probably get the AQL stuff done first, though, and try the virtual time scheduler on top of that. BTW, I think Felix' concern about powersave was in relation to AQL: If we don't handle power save in that, we can end up in a situation where the budget for packets allowed to be queued in the firmware is taken up entirely by stations that are currently in powersave mode; which would throttle the device completely. So we should take that into account for AQL; for the fairness scheduler, stations in powersave are already unscheduled, so that should be fine. -Toke
On 2019-09-27 14:12, Toke Høiland-Jørgensen wrote: > Kan Yan <kyan@google.com> writes: > >>> No, but on tx_completion we could do something like this: >>> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; >>> ieee80211_report_airtime(sta, airtime); >>> That way, if the driver sets the tx_time field to something the >>> firmware >>> reports, we'll use that, and otherwise we'd fall back to the >>> estimate. >>> Of course, there would need to be a way for the driver to opt out of >>> this, for drivers that report out of band airtime like ath10k does :) >> >> I doubt that will work, unless firmware can do per frame airtime >> update in the skb. It is more likely that firmware provides out of >> band airtime update for a period of time, like an aggregation. That's >> the case for ath10k: https://patchwork.kernel.org/patch/10684689 > > No, ath10k would continue to do what it was always doing. Drivers that > can report after-the-fact airtime usage per-frame (like ath9k) will > continue to do that. In both of those cases, the airtime estimate is > only used to throttle the queue, not to schedule for fairness. > > My point is just that for drivers that have *no* mechanism to report > airtime usage after-the-fact, we can add a flag that will allow the > values estimated by mac80211 to also be used for the fairness > scheduler... > >> Regarding handling frame for station enters power saving mode: >>> >>> >> Oh, right. Didn't know that could happen (I assumed those would be >>> >> flushed out or something). But if we're going to go with Kan's >>> >> suggestion of having per-station accounting as well as a global >>> >> accounting for the device, we could just subtract the station's >>> >> outstanding balance from the device total when it goes into powersave >>> >> mode, and add it back once it wakes up again. At least I think that >>> >> would work, no? >>> >Yes, I think that would work. >>> Great! Will incorporate that, then. >> >> I think that could work but things could be a bit more complicated. >> Not sure I fully understand the usage of airtime_weight_sum in [PATCH >> V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler: >> >> in ieee80211_schedule_txq(): >> local->airtime_weight_sum[ac] += sta->airtime_weight; >> >> in ieee80211_sta_register_airtime(): >> weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight; >> local->airtime_v_t[ac] += airtime / weight_sum; >> sta->airtime[ac].v_t += airtime / sta->airtime_weight; >> >> in __ieee80211_unschedule_txq >> local->airtime_weight_sum[ac] -= sta->airtime_weight; >> >> I assume the purpose of airtime_weight_sum is to count station's >> virtual airtime proportional to the number of active stations for >> fairness. > > Yup, the proportion between the station's airtime weight and the total > scheduled airtime weight indicates the station's fair share. > >> My concern is the per interface local->airtime_weight_sum[ac] get >> updated when packets are released from a txq to lower layer, but it >> doesn't mean the airtime will be consumed (packets get transmitted) >> shortly, due to events like station enter power save mode, so the >> weight_sum used in ieee80211_sta_register_airtime() maybe inaccurate. >> For architects using firmware/hardware offloading, firmware ultimately >> controls packet scheduling and has quite a lot of autonomy. The host >> driver's airtime_weight_sum may get out of sync with the number of >> active stations actually scheduled by firmware even without power >> saving events. >> >> Is this a correct understanding? I kind of think the original method >> of airtime accounting using deficit maybe more robust in this regard. > > You are right that this could happen, yes. However, the station is only > unscheduled when its mac80211 queue runs completely empty. So the > assumption is that stations that transmit continuously (which are > really > the ones we care about for fairness purposes), would keep being > scheduled most of the time. > > Now, you're quite right that this assumption might be wrong, which > would > lead to bad results. I think the other (queue throttling) patch set > would help, though; that should push the queues up into mac80211 and > give the stations a higher probability of being scheduled when they are > in fact backlogged. I've only tested the virtual time scheduler on > ath9k, which inherently has shallow buffers in the hardware. > > So yeah, it may be that the virtual time-thing turns out to not work > well. But the results looked encouraging on ath9k, and since it will I am not familiar with ath9k but SFAIK, ath9k is fine with virtual time-thing because it does not have firmware/hardware offloading. Thus host can see the packets backlogged in host queue and TXQs stay on rbtree for the most of time if there is continuous transmission. As a result, the algo fully works. While for ath10k like Kan said, it has firmware/hardware offloading and host cannot see the packets backlogged in host queue because they are already sent to FW queue as long as the ingress load less than PHY rate. Then TXQs are removed from the rbtree which leads to the algo not working so well. For driver that has firmware/hardware offloading, I think the key is to keep some of the packets buffered in the host even if FW queue is not full at that time. Also I believe DRR may have the same issue since only TXQs in the list contend for Tx chance. > make it easier to schedule multiple stations, I think it has some merit > that makes it worth trying out. We should probably get the AQL stuff > done first, though, and try the virtual time scheduler on top of that. Agree that we should get the AQL stuff done first since I believe it will help to fix the issue mentioned above. > > BTW, I think Felix' concern about powersave was in relation to AQL: If > we don't handle power save in that, we can end up in a situation where > the budget for packets allowed to be queued in the firmware is taken up > entirely by stations that are currently in powersave mode; which would > throttle the device completely. So we should take that into account for > AQL; for the fairness scheduler, stations in powersave are already > unscheduled, so that should be fine. > > -Toke
> No, ath10k would continue to do what it was always doing. Drivers that > can report after-the-fact airtime usage per-frame (like ath9k) will > continue to do that. In both of those cases, the airtime estimate is > only used to throttle the queue, not to schedule for fairness. You are right, I didn't realize ath9k reports per frame airtime usage. > Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? The per interface 24 ms queue limit is an empirical number that works well for both achieve low latency when there is a lot of stations and get high throughput when there is only 1-2 stations. We could make it configurable. > BTW, I think Felix' concern about powersave was in relation to AQL: If > we don't handle power save in that, we can end up in a situation where >the budget for packets allowed to be queued in the firmware is taken up > entirely by stations that are currently in powersave mode; which would > throttle the device completely. So we should take that into account for > AQL; for the fairness scheduler, stations in powersave are already > unscheduled, so that should be fine. I think the accounting for the airtime of frames in the power saving queue could affect both the fairness scheduler and AQL. For chipset with firmware offload, PS handling is mostly done by firmware, so host driver's knowledge of PS state could be slightly out-of-dated. The power save behavior also make it harder to the airtime_weight correct for the fairness scheduler. Powersave mode's impact to AQL is much smaller. The lower per station queue limit is not impacted by other stations PS behavior, since the estimated future airtime is not weighted for other stations and a station won't get blocked due to others stations in PS mode. Station in PS mode do affects AQL's higher per interface limit, but in an inconsequential way. The per interface AQL queue limit is quite large (24 ms), hence airtime from packets in PS queue is unlikely to have a significant impact on it. Still, it will be better if the packet in power saving queue can be taken into account. > > make it easier to schedule multiple stations, I think it has some merit > > that makes it worth trying out. We should probably get the AQL stuff > > done first, though, and try the virtual time scheduler on top of that. > Agree that we should get the AQL stuff done first since I believe it > will help to fix the issue mentioned above. That sounds like a good plan. The virtual time scheduler is more involved and will take more work to get it right. It make sense to get AQL done first. On Fri, Sep 27, 2019 at 2:20 AM Yibo Zhao <yiboz@codeaurora.org> wrote: > > On 2019-09-27 14:12, Toke Høiland-Jørgensen wrote: > > Kan Yan <kyan@google.com> writes: > > > >>> No, but on tx_completion we could do something like this: > >>> airtime = CB(skb)->tx_time ?: CB(skb)->tx_time_est; > >>> ieee80211_report_airtime(sta, airtime); > >>> That way, if the driver sets the tx_time field to something the > >>> firmware > >>> reports, we'll use that, and otherwise we'd fall back to the > >>> estimate. > >>> Of course, there would need to be a way for the driver to opt out of > >>> this, for drivers that report out of band airtime like ath10k does :) > >> > >> I doubt that will work, unless firmware can do per frame airtime > >> update in the skb. It is more likely that firmware provides out of > >> band airtime update for a period of time, like an aggregation. That's > >> the case for ath10k: https://patchwork.kernel.org/patch/10684689 > > > > No, ath10k would continue to do what it was always doing. Drivers that > > can report after-the-fact airtime usage per-frame (like ath9k) will > > continue to do that. In both of those cases, the airtime estimate is > > only used to throttle the queue, not to schedule for fairness. > > > > My point is just that for drivers that have *no* mechanism to report > > airtime usage after-the-fact, we can add a flag that will allow the > > values estimated by mac80211 to also be used for the fairness > > scheduler... > > > >> Regarding handling frame for station enters power saving mode: > >>> > >>> >> Oh, right. Didn't know that could happen (I assumed those would be > >>> >> flushed out or something). But if we're going to go with Kan's > >>> >> suggestion of having per-station accounting as well as a global > >>> >> accounting for the device, we could just subtract the station's > >>> >> outstanding balance from the device total when it goes into powersave > >>> >> mode, and add it back once it wakes up again. At least I think that > >>> >> would work, no? > >>> >Yes, I think that would work. > >>> Great! Will incorporate that, then. > >> > >> I think that could work but things could be a bit more complicated. > >> Not sure I fully understand the usage of airtime_weight_sum in [PATCH > >> V3 1/4] mac80211: Switch to a virtual time-based airtime scheduler: > >> > >> in ieee80211_schedule_txq(): > >> local->airtime_weight_sum[ac] += sta->airtime_weight; > >> > >> in ieee80211_sta_register_airtime(): > >> weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight; > >> local->airtime_v_t[ac] += airtime / weight_sum; > >> sta->airtime[ac].v_t += airtime / sta->airtime_weight; > >> > >> in __ieee80211_unschedule_txq > >> local->airtime_weight_sum[ac] -= sta->airtime_weight; > >> > >> I assume the purpose of airtime_weight_sum is to count station's > >> virtual airtime proportional to the number of active stations for > >> fairness. > > > > Yup, the proportion between the station's airtime weight and the total > > scheduled airtime weight indicates the station's fair share. > > > >> My concern is the per interface local->airtime_weight_sum[ac] get > >> updated when packets are released from a txq to lower layer, but it > >> doesn't mean the airtime will be consumed (packets get transmitted) > >> shortly, due to events like station enter power save mode, so the > >> weight_sum used in ieee80211_sta_register_airtime() maybe inaccurate. > >> For architects using firmware/hardware offloading, firmware ultimately > >> controls packet scheduling and has quite a lot of autonomy. The host > >> driver's airtime_weight_sum may get out of sync with the number of > >> active stations actually scheduled by firmware even without power > >> saving events. > >> > >> Is this a correct understanding? I kind of think the original method > >> of airtime accounting using deficit maybe more robust in this regard. > > > > You are right that this could happen, yes. However, the station is only > > unscheduled when its mac80211 queue runs completely empty. So the > > assumption is that stations that transmit continuously (which are > > really > > the ones we care about for fairness purposes), would keep being > > scheduled most of the time. > > > > Now, you're quite right that this assumption might be wrong, which > > would > > lead to bad results. I think the other (queue throttling) patch set > > would help, though; that should push the queues up into mac80211 and > > give the stations a higher probability of being scheduled when they are > > in fact backlogged. I've only tested the virtual time scheduler on > > ath9k, which inherently has shallow buffers in the hardware. > > > > So yeah, it may be that the virtual time-thing turns out to not work > > well. But the results looked encouraging on ath9k, and since it will > > I am not familiar with ath9k but SFAIK, ath9k is fine with virtual > time-thing because it does not have firmware/hardware offloading. Thus > host can see the packets backlogged in host queue and TXQs stay on > rbtree for the most of time if there is continuous transmission. As a > result, the algo fully works. While for ath10k like Kan said, it has > firmware/hardware offloading and host cannot see the packets backlogged > in host queue because they are already sent to FW queue as long as the > ingress load less than PHY rate. Then TXQs are removed from the rbtree > which leads to the algo not working so well. For driver that has > firmware/hardware offloading, I think the key is to keep some of the > packets buffered in the host even if FW queue is not full at that time. > Also I believe DRR may have the same issue since only TXQs in the list > contend for Tx chance. > > > make it easier to schedule multiple stations, I think it has some merit > > that makes it worth trying out. We should probably get the AQL stuff > > done first, though, and try the virtual time scheduler on top of that. > > Agree that we should get the AQL stuff done first since I believe it > will help to fix the issue mentioned above. > > > > > BTW, I think Felix' concern about powersave was in relation to AQL: If > > we don't handle power save in that, we can end up in a situation where > > the budget for packets allowed to be queued in the firmware is taken up > > entirely by stations that are currently in powersave mode; which would > > throttle the device completely. So we should take that into account for > > AQL; for the fairness scheduler, stations in powersave are already > > unscheduled, so that should be fine. > > > > -Toke > > -- > Yibo
Kan Yan <kyan@google.com> writes: >> No, ath10k would continue to do what it was always doing. Drivers that >> can report after-the-fact airtime usage per-frame (like ath9k) will >> continue to do that. In both of those cases, the airtime estimate is >> only used to throttle the queue, not to schedule for fairness. > You are right, I didn't realize ath9k reports per frame airtime usage. > >> Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? > The per interface 24 ms queue limit is an empirical number that works > well for both achieve low latency when there is a lot of stations and > get high throughput when there is only 1-2 stations. We could make it > configurable. Right. "Found by trial and error" is a fine answer as far as I'm concerned :) But yeah, this should probably be configurable, like BQL is. >> BTW, I think Felix' concern about powersave was in relation to AQL: If >> we don't handle power save in that, we can end up in a situation where >>the budget for packets allowed to be queued in the firmware is taken up >> entirely by stations that are currently in powersave mode; which would >> throttle the device completely. So we should take that into account for >> AQL; for the fairness scheduler, stations in powersave are already >> unscheduled, so that should be fine. > I think the accounting for the airtime of frames in the power saving > queue could affect both the fairness scheduler and AQL. > For chipset with firmware offload, PS handling is mostly done by > firmware, so host driver's knowledge of PS state could be slightly > out-of-dated. The power save behavior also make it harder to the > airtime_weight correct for the fairness scheduler. Hmm, maybe. I'm not sure how significant this effect would be, but I guess we'll need to find out :) > Powersave mode's impact to AQL is much smaller. The lower per station > queue limit is not impacted by other stations PS behavior, since the > estimated future airtime is not weighted for other stations and a > station won't get blocked due to others stations in PS mode. > Station in PS mode do affects AQL's higher per interface limit, but in > an inconsequential way. The per interface AQL queue limit is quite > large (24 ms), hence airtime from packets in PS queue is unlikely to > have a significant impact on it. Still, it will be better if the > packet in power saving queue can be taken into account. I guess the risk is lower when with a 24ms per-iface limit; but with enough stations I guess it could still happen, no? So we should probably handle this case... >> > make it easier to schedule multiple stations, I think it has some merit >> > that makes it worth trying out. We should probably get the AQL stuff >> > done first, though, and try the virtual time scheduler on top of that. >> Agree that we should get the AQL stuff done first since I believe it >> will help to fix the issue mentioned above. > That sounds like a good plan. The virtual time scheduler is more > involved and will take more work to get it right. It make sense to get > AQL done first. Cool. Are you going to submit a ported version of your implementation? Then we can work from the two submissions and see if we can't converge on something... -Toke
> I guess the risk is lower when with a 24ms per-iface limit; but with > enough stations I guess it could still happen, no? So we should probably > handle this case... Each txq (per sta, per tid) is allowed to release at least the lower AQL limit amount of packet (default 4ms), which is not affected by other station's PS behavior and 4ms should be sufficient for most use cases. The 24ms per-interface limit is an optimization to get good benchmark score in peak performance test, which usually only involve 1-2 stations. The higher limit probably won't matter anymore when there are many stations. I haven't noticed side effects due to PS behavior in the ChromiumOS version. Still, it is better to be able to take frames in PS queue in to account, > Cool. Are you going to submit a ported version of your implementation? > Then we can work from the two submissions and see if we can't converge > on something... Working on porting, should have something ready before the end of this week. On Sun, Sep 29, 2019 at 12:18 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote: > > Kan Yan <kyan@google.com> writes: > > >> No, ath10k would continue to do what it was always doing. Drivers that > >> can report after-the-fact airtime usage per-frame (like ath9k) will > >> continue to do that. In both of those cases, the airtime estimate is > >> only used to throttle the queue, not to schedule for fairness. > > You are right, I didn't realize ath9k reports per frame airtime usage. > > > >> Yeah, I was wondering about that. Makes sense. Why 24ms, exactly? > > The per interface 24 ms queue limit is an empirical number that works > > well for both achieve low latency when there is a lot of stations and > > get high throughput when there is only 1-2 stations. We could make it > > configurable. > > Right. "Found by trial and error" is a fine answer as far as I'm > concerned :) > > But yeah, this should probably be configurable, like BQL is. > > >> BTW, I think Felix' concern about powersave was in relation to AQL: If > >> we don't handle power save in that, we can end up in a situation where > >>the budget for packets allowed to be queued in the firmware is taken up > >> entirely by stations that are currently in powersave mode; which would > >> throttle the device completely. So we should take that into account for > >> AQL; for the fairness scheduler, stations in powersave are already > >> unscheduled, so that should be fine. > > I think the accounting for the airtime of frames in the power saving > > queue could affect both the fairness scheduler and AQL. > > For chipset with firmware offload, PS handling is mostly done by > > firmware, so host driver's knowledge of PS state could be slightly > > out-of-dated. The power save behavior also make it harder to the > > airtime_weight correct for the fairness scheduler. > > Hmm, maybe. I'm not sure how significant this effect would be, but I > guess we'll need to find out :) > > > > Powersave mode's impact to AQL is much smaller. The lower per station > > queue limit is not impacted by other stations PS behavior, since the > > estimated future airtime is not weighted for other stations and a > > station won't get blocked due to others stations in PS mode. > > Station in PS mode do affects AQL's higher per interface limit, but in > > an inconsequential way. The per interface AQL queue limit is quite > > large (24 ms), hence airtime from packets in PS queue is unlikely to > > have a significant impact on it. Still, it will be better if the > > packet in power saving queue can be taken into account. > > I guess the risk is lower when with a 24ms per-iface limit; but with > enough stations I guess it could still happen, no? So we should probably > handle this case... > > >> > make it easier to schedule multiple stations, I think it has some merit > >> > that makes it worth trying out. We should probably get the AQL stuff > >> > done first, though, and try the virtual time scheduler on top of that. > >> Agree that we should get the AQL stuff done first since I believe it > >> will help to fix the issue mentioned above. > > That sounds like a good plan. The virtual time scheduler is more > > involved and will take more work to get it right. It make sense to get > > AQL done first. > > Cool. Are you going to submit a ported version of your implementation? > Then we can work from the two submissions and see if we can't converge > on something... > > -Toke >
Kan Yan <kyan@google.com> writes: >> I guess the risk is lower when with a 24ms per-iface limit; but with >> enough stations I guess it could still happen, no? So we should probably >> handle this case... > Each txq (per sta, per tid) is allowed to release at least the lower > AQL limit amount of packet (default 4ms), which is not affected by > other station's PS behavior and 4ms should be sufficient for most use > cases. Ah, I thought you'd meant each station can queue MIN(4ms, 24ms-<other stations>). I see that is not the case; it's up to 10ms as long as the total is less than 20ms, and up to 4ms otherwise. > The 24ms per-interface limit is an optimization to get good benchmark > score in peak performance test, which usually only involve 1-2 > stations. Gotta get those benchmark numbers in ;) > The higher limit probably won't matter anymore when there are many > stations. I haven't noticed side effects due to PS behavior in the > ChromiumOS version. Still, it is better to be able to take frames in > PS queue in to account, As long as one station always gets its 4ms, I'm not too worried about PS; but that was not the case in my patch :) >> Cool. Are you going to submit a ported version of your implementation? >> Then we can work from the two submissions and see if we can't converge >> on something... > Working on porting, should have something ready before the end of this > week. Great! -Toke
diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c index 568b3b276931..c846c6e7f3e3 100644 --- a/net/mac80211/debugfs.c +++ b/net/mac80211/debugfs.c @@ -148,6 +148,29 @@ static const struct file_operations aqm_ops = { .llseek = default_llseek, }; +static ssize_t airtime_queued_read(struct file *file, + char __user *user_buf, + size_t count, + loff_t *ppos) +{ + struct ieee80211_local *local = file->private_data; + char buf[32 * IEEE80211_NUM_ACS], *p = buf; + u8 ac; + + for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) + p += scnprintf(p, sizeof(buf)+buf-p, "AC%u: %u\n", ac, + local->airtime_queued[ac]); + + return simple_read_from_buffer(userbuf, count, ppos, buf, p - buf); + +} + +static const struct file_operations airtime_queued_ops = { + .read = airtime_queued_read, + .open = simple_open, + .llseek = default_llseek, +}; + static ssize_t force_tx_status_read(struct file *file, char __user *user_buf, size_t count, @@ -440,6 +463,7 @@ void debugfs_hw_add(struct ieee80211_local *local) debugfs_create_u16("airtime_flags", 0600, phyd, &local->airtime_flags); + DEBUGFS_ADD(airtime_queued); statsd = debugfs_create_dir("statistics", phyd); diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h index 9de5390411ba..6bebfe80ed29 100644 --- a/net/mac80211/ieee80211_i.h +++ b/net/mac80211/ieee80211_i.h @@ -63,6 +63,12 @@ struct ieee80211_local; #define IEEE80211_RECIPROCAL_DIVISOR 0x100000000ULL #define IEEE80211_RECIPROCAL_SHIFT 32 +/* constants used for airtime queue limit */ +#define IEEE80211_AIRTIME_QUEUE_LIMIT 8000 /* 8 ms */ +#define IEEE80211_AIRTIME_OVERHEAD 100 +#define IEEE80211_AIRTIME_OVERHEAD_IFS 16 +#define IEEE80211_AIRTIME_MINRATE_RECIPROCAL (IEEE80211_RECIPROCAL_DIVISOR / 6000) + /* * Some APs experience problems when working with U-APSD. Decreasing the * probability of that happening by using legacy mode for all ACs but VO isn't @@ -1144,6 +1150,7 @@ struct ieee80211_local { spinlock_t active_txq_lock[IEEE80211_NUM_ACS]; struct list_head active_txqs[IEEE80211_NUM_ACS]; u16 schedule_round[IEEE80211_NUM_ACS]; + u32 airtime_queued[IEEE80211_NUM_ACS]; u16 airtime_flags; diff --git a/net/mac80211/status.c b/net/mac80211/status.c index ab8ba5835ca0..e63a70657050 100644 --- a/net/mac80211/status.c +++ b/net/mac80211/status.c @@ -711,6 +711,28 @@ static void ieee80211_report_used_skb(struct ieee80211_local *local, skb->wifi_acked = acked; } + if (info->control.tx_time_est) { + u8 *qc, ac; + int tid; + + if (ieee80211_is_data_qos(hdr->frame_control)) { + qc = ieee80211_get_qos_ctl(hdr); + tid = qc[0] & 0xf; + ac = ieee80211_ac_from_tid(tid); + } else { + ac = IEEE80211_AC_BE; + } + + spin_lock_bh(&local->active_txq_lock[ac]); + /* sanity check to make sure we don't underflow */ + if (WARN_ON_ONCE(info->control.tx_time_est > local->airtime_queued[ac])) + local->airtime_queued[ac] = 0; + else + local->airtime_queued[ac] -= info->control.tx_time_est; + spin_unlock_bh(&local->active_txq_lock[ac]); + + } + ieee80211_led_tx(local); if (skb_has_frag_list(skb)) { diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c index 1fa422782905..2b8564621ecf 100644 --- a/net/mac80211/tx.c +++ b/net/mac80211/tx.c @@ -3546,9 +3546,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, struct ieee80211_tx_data tx; ieee80211_tx_result r; struct ieee80211_vif *vif = txq->vif; + u32 airtime = 0, airtime_queued; + u8 ac = txq->ac; + u32 pktlen; WARN_ON_ONCE(softirq_count() == 0); + spin_lock_bh(&local->active_txq_lock[ac]); + airtime_queued = local->airtime_queued[ac]; + spin_unlock_bh(&local->active_txq_lock[ac]); + + if (airtime_queued > IEEE80211_AIRTIME_QUEUE_LIMIT) + return NULL; + begin: spin_lock_bh(&fq->lock); @@ -3581,8 +3591,19 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, tx.skb = skb; tx.sdata = vif_to_sdata(info->control.vif); - if (txq->sta) + pktlen = skb->len + 38; + if (txq->sta) { tx.sta = container_of(txq->sta, struct sta_info, sta); + if (tx.sta->last_tx_bitrate) { + airtime = (pktlen * 8 * 1000 * + tx.sta->last_tx_bitrate_reciprocal) >> IEEE80211_RECIPROCAL_SHIFT; + airtime += IEEE80211_AIRTIME_OVERHEAD; + } + } else { + airtime = (pktlen * 8 * 1000 * + IEEE80211_AIRTIME_MINRATE_RECIPROCAL) >> IEEE80211_RECIPROCAL_SHIFT; + airtime += IEEE80211_AIRTIME_OVERHEAD; + } /* * The key can be removed while the packet was queued, so need to call @@ -3659,6 +3680,15 @@ struct sk_buff *ieee80211_tx_dequeue(struct ieee80211_hw *hw, } IEEE80211_SKB_CB(skb)->control.vif = vif; + + if (airtime) { + info->control.tx_time_est = airtime; + + spin_lock_bh(&local->active_txq_lock[ac]); + local->airtime_queued[ac] += airtime; + spin_unlock_bh(&local->active_txq_lock[ac]); + } + return skb; out: @@ -3676,6 +3706,9 @@ struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac) spin_lock_bh(&local->active_txq_lock[ac]); + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) + goto out; + begin: txqi = list_first_entry_or_null(&local->active_txqs[ac], struct txq_info, @@ -3753,6 +3786,9 @@ bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw, spin_lock_bh(&local->active_txq_lock[ac]); + if (local->airtime_queued[ac] > IEEE80211_AIRTIME_QUEUE_LIMIT) + goto out; + if (!txqi->txq.sta) goto out;