diff mbox series

[RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler

Message ID 20190215170512.31512-1-toke@redhat.com (mailing list archive)
State RFC
Delegated to: Johannes Berg
Headers show
Series [RFC/RFT] mac80211: Switch to a virtual time-based airtime scheduler | expand

Commit Message

Toke Høiland-Jørgensen Feb. 15, 2019, 5:05 p.m. UTC
This switches the airtime scheduler in mac80211 to use a virtual time-based
scheduler instead of the round-robin scheduler used before. This has a
couple of advantages:

- No need to sync up the round-robin scheduler in firmware/hardware with
  the round-robin airtime scheduler.

- If several stations are eligible for transmission we can schedule both of
  them; no need to hard-block the scheduling rotation until the head of the
  queue has used up its quantum.

- The check of whether a station is eligible for transmission becomes
  simpler (in ieee80211_txq_may_transmit()).

The drawback is that scheduling becomes slightly more expensive, as we need
to maintain an rbtree of TXQs sorted by virtual time. This means that
ieee80211_register_airtime() becomes O(logN) in the number of currently
scheduled TXQs. However, hopefully this number rarely grows too big (it's
only TXQs currently backlogged, not all associated stations), so it
shouldn't be too big of an issue.

Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
---
This is basically the idea I mentioned earlier for a different way to
handle the airtime scheduling.

I've tested it on ath9k, where it achieves the same fairness and
weighing properties as the old scheduler. It would be good if you could
test it on your ath10k setup, Rajkumar; and all of you please comment on
whether you agree that this is better from an API point of view.

 net/mac80211/debugfs.c     |  49 ++++++++-
 net/mac80211/debugfs_sta.c |  16 +--
 net/mac80211/ieee80211_i.h |  15 ++-
 net/mac80211/main.c        |   2 +-
 net/mac80211/sta_info.c    |  19 +++-
 net/mac80211/sta_info.h    |   3 +-
 net/mac80211/tx.c          | 217 ++++++++++++++++++++++++-------------
 7 files changed, 225 insertions(+), 96 deletions(-)

Comments

Dave Taht Feb. 15, 2019, 7:44 p.m. UTC | #1
On Fri, Feb 15, 2019 at 9:05 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
>
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
>
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.

This sounds like this will support some concepts in 802.11ax much better?

Do we have any feedback from ax folk (any platform - qca, intel,
mediatek, broadcom) as to whether this will help?

> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
>
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.

SGTM.

> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> This is basically the idea I mentioned earlier for a different way to
> handle the airtime scheduling.
>
> I've tested it on ath9k, where it achieves the same fairness and
> weighing properties as the old scheduler. It would be good if you could
> test it on your ath10k setup, Rajkumar; and all of you please comment on
> whether you agree that this is better from an API point of view.
>
>  net/mac80211/debugfs.c     |  49 ++++++++-
>  net/mac80211/debugfs_sta.c |  16 +--
>  net/mac80211/ieee80211_i.h |  15 ++-
>  net/mac80211/main.c        |   2 +-
>  net/mac80211/sta_info.c    |  19 +++-
>  net/mac80211/sta_info.h    |   3 +-
>  net/mac80211/tx.c          | 217 ++++++++++++++++++++++++-------------
>  7 files changed, 225 insertions(+), 96 deletions(-)
>
> diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
> index 343ad0a915e4..93b9ed2f9451 100644
> --- a/net/mac80211/debugfs.c
> +++ b/net/mac80211/debugfs.c
> @@ -150,6 +150,47 @@ static const struct file_operations aqm_ops = {
>         .llseek = default_llseek,
>  };
>
> +static ssize_t airtime_read(struct file *file,
> +                           char __user *user_buf,
> +                           size_t count,
> +                           loff_t *ppos)
> +{
> +       struct ieee80211_local *local = file->private_data;
> +       char buf[200];
> +       u64 v_t[IEEE80211_NUM_ACS];
> +       u64 wt[IEEE80211_NUM_ACS];
> +       int len = 0, ac;
> +
> +       for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
> +               spin_lock_bh(&local->active_txq_lock[ac]);
> +               v_t[ac] = local->airtime_v_t[ac];
> +               wt[ac] = local->airtime_weight_sum[ac];
> +               spin_unlock_bh(&local->active_txq_lock[ac]);
> +       }
> +       len = scnprintf(buf, sizeof(buf),
> +                       "\tVO         VI         BE         BK\n"
> +                       "Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
> +                       "Weight\t%-10llu %-10llu %-10llu %-10llu\n",
> +                       v_t[0],
> +                       v_t[1],
> +                       v_t[2],
> +                       v_t[3],
> +                       wt[0],
> +                       wt[1],
> +                       wt[2],
> +                       wt[3]);
> +
> +       return simple_read_from_buffer(user_buf, count, ppos,
> +                                      buf, len);
> +}
> +
> +static const struct file_operations airtime_ops = {
> +       .read = airtime_read,
> +       .open = simple_open,
> +       .llseek = default_llseek,
> +};
> +
> +
>  #ifdef CONFIG_PM
>  static ssize_t reset_write(struct file *file, const char __user *user_buf,
>                            size_t count, loff_t *ppos)
> @@ -384,8 +425,12 @@ void debugfs_hw_add(struct ieee80211_local *local)
>         if (local->ops->wake_tx_queue)
>                 DEBUGFS_ADD_MODE(aqm, 0600);
>
> -       debugfs_create_u16("airtime_flags", 0600,
> -                          phyd, &local->airtime_flags);
> +       if (wiphy_ext_feature_isset(local->hw.wiphy,
> +                                   NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
> +               DEBUGFS_ADD_MODE(airtime, 0600);
> +               debugfs_create_u16("airtime_flags", 0600,
> +                                  phyd, &local->airtime_flags);
> +       }
>
>         statsd = debugfs_create_dir("statistics", phyd);
>
> diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
> index 3aa618dcc58e..80028da27b98 100644
> --- a/net/mac80211/debugfs_sta.c
> +++ b/net/mac80211/debugfs_sta.c
> @@ -203,7 +203,7 @@ static ssize_t sta_airtime_read(struct file *file, char __user *userbuf,
>         size_t bufsz = 200;
>         char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
>         u64 rx_airtime = 0, tx_airtime = 0;
> -       s64 deficit[IEEE80211_NUM_ACS];
> +       u64 v_t[IEEE80211_NUM_ACS];
>         ssize_t rv;
>         int ac;
>
> @@ -214,20 +214,20 @@ static ssize_t sta_airtime_read(struct file *file, char __user *userbuf,
>                 spin_lock_bh(&local->active_txq_lock[ac]);
>                 rx_airtime += sta->airtime[ac].rx_airtime;
>                 tx_airtime += sta->airtime[ac].tx_airtime;
> -               deficit[ac] = sta->airtime[ac].deficit;
> +               v_t[ac] = sta->airtime[ac].v_t;
>                 spin_unlock_bh(&local->active_txq_lock[ac]);
>         }
>
>         p += scnprintf(p, bufsz + buf - p,
>                 "RX: %llu us\nTX: %llu us\nWeight: %u\n"
> -               "Deficit: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
> +               "Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
>                 rx_airtime,
>                 tx_airtime,
>                 sta->airtime_weight,
> -               deficit[0],
> -               deficit[1],
> -               deficit[2],
> -               deficit[3]);
> +               v_t[0],
> +               v_t[1],
> +               v_t[2],
> +               v_t[3]);
>
>         rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf);
>         kfree(buf);
> @@ -245,7 +245,7 @@ static ssize_t sta_airtime_write(struct file *file, const char __user *userbuf,
>                 spin_lock_bh(&local->active_txq_lock[ac]);
>                 sta->airtime[ac].rx_airtime = 0;
>                 sta->airtime[ac].tx_airtime = 0;
> -               sta->airtime[ac].deficit = sta->airtime_weight;
> +               sta->airtime[ac].v_t = 0;
>                 spin_unlock_bh(&local->active_txq_lock[ac]);
>         }
>
> diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
> index 056b16bce3b0..994a70ee0ec0 100644
> --- a/net/mac80211/ieee80211_i.h
> +++ b/net/mac80211/ieee80211_i.h
> @@ -840,8 +840,7 @@ struct txq_info {
>         struct codel_vars def_cvars;
>         struct codel_stats cstats;
>         struct sk_buff_head frags;
> -       struct list_head schedule_order;
> -       u16 schedule_round;
> +       struct rb_node schedule_order;
>         unsigned long flags;
>
>         /* keep last! */
> @@ -1135,8 +1134,10 @@ struct ieee80211_local {
>
>         /* protects active_txqs and txqi->schedule_order */
>         spinlock_t active_txq_lock[IEEE80211_NUM_ACS];
> -       struct list_head active_txqs[IEEE80211_NUM_ACS];
> -       u16 schedule_round[IEEE80211_NUM_ACS];
> +       struct rb_root_cached active_txqs[IEEE80211_NUM_ACS];
> +       struct rb_node *schedule_pos[IEEE80211_NUM_ACS];
> +       u64 airtime_v_t[IEEE80211_NUM_ACS];
> +       u64 airtime_weight_sum[IEEE80211_NUM_ACS];
>
>         u16 airtime_flags;
>
> @@ -1765,6 +1766,12 @@ int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
>                               const u8 *buf, size_t len,
>                               const u8 *dest, __be16 proto, bool unencrypted);
>
> +void ieee80211_resort_txq(struct ieee80211_hw *hw,
> +                         struct ieee80211_txq *txq);
> +void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +                             struct ieee80211_txq *txq);
> +
> +
>  /* HT */
>  void ieee80211_apply_htcap_overrides(struct ieee80211_sub_if_data *sdata,
>                                      struct ieee80211_sta_ht_cap *ht_cap);
> diff --git a/net/mac80211/main.c b/net/mac80211/main.c
> index 71005b6dfcd1..f207bc284921 100644
> --- a/net/mac80211/main.c
> +++ b/net/mac80211/main.c
> @@ -666,7 +666,7 @@ struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t priv_data_len,
>         spin_lock_init(&local->queue_stop_reason_lock);
>
>         for (i = 0; i < IEEE80211_NUM_ACS; i++) {
> -               INIT_LIST_HEAD(&local->active_txqs[i]);
> +               local->active_txqs[i] = RB_ROOT_CACHED;
>                 spin_lock_init(&local->active_txq_lock[i]);
>         }
>         local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index 11f058987a54..9d01fdd86e2d 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -389,7 +389,6 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
>         for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>                 skb_queue_head_init(&sta->ps_tx_buf[i]);
>                 skb_queue_head_init(&sta->tx_filtered[i]);
> -               sta->airtime[i].deficit = sta->airtime_weight;
>         }
>
>         for (i = 0; i < IEEE80211_NUM_TIDS; i++)
> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
>  {
>         struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>         struct ieee80211_local *local = sta->sdata->local;
> +       struct ieee80211_txq *txq = sta->sta.txq[tid];
>         u8 ac = ieee80211_ac_from_tid(tid);
> -       u32 airtime = 0;
> +       u64 airtime = 0, weight_sum;
> +
> +       if (!txq)
> +               return;
>
>         if (sta->local->airtime_flags & AIRTIME_USE_TX)
>                 airtime += tx_airtime;
>         if (sta->local->airtime_flags & AIRTIME_USE_RX)
>                 airtime += rx_airtime;
>
> +       /* Weights scale so the unit weight is 256 */
> +       airtime <<= 8;
> +
>         spin_lock_bh(&local->active_txq_lock[ac]);
> +
>         sta->airtime[ac].tx_airtime += tx_airtime;
>         sta->airtime[ac].rx_airtime += rx_airtime;
> -       sta->airtime[ac].deficit -= airtime;
> +
> +       weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
> +
> +       local->airtime_v_t[ac] += airtime / weight_sum;
> +       sta->airtime[ac].v_t += airtime / sta->airtime_weight;
> +       ieee80211_resort_txq(&local->hw, txq);
> +
>         spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_sta_register_airtime);
> diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
> index 05647d835894..4b901a2a998e 100644
> --- a/net/mac80211/sta_info.h
> +++ b/net/mac80211/sta_info.h
> @@ -130,11 +130,12 @@ enum ieee80211_agg_stop_reason {
>  /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler */
>  #define AIRTIME_USE_TX         BIT(0)
>  #define AIRTIME_USE_RX         BIT(1)
> +#define AIRTIME_GRACE 500 /* usec of grace period before reset */
>
>  struct airtime_info {
>         u64 rx_airtime;
>         u64 tx_airtime;
> -       s64 deficit;
> +       u64 v_t;
>  };
>
>  struct sta_info;
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 61c7ea9de2cc..8e125df8ade0 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1449,7 +1449,7 @@ void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
>         codel_vars_init(&txqi->def_cvars);
>         codel_stats_init(&txqi->cstats);
>         __skb_queue_head_init(&txqi->frags);
> -       INIT_LIST_HEAD(&txqi->schedule_order);
> +       RB_CLEAR_NODE(&txqi->schedule_order);
>
>         txqi->txq.vif = &sdata->vif;
>
> @@ -1493,9 +1493,7 @@ void ieee80211_txq_purge(struct ieee80211_local *local,
>         ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
>         spin_unlock_bh(&fq->lock);
>
> -       spin_lock_bh(&local->active_txq_lock[txqi->txq.ac]);
> -       list_del_init(&txqi->schedule_order);
> -       spin_unlock_bh(&local->active_txq_lock[txqi->txq.ac]);
> +       ieee80211_unschedule_txq(&local->hw, &txqi->txq);
>  }
>
>  void ieee80211_txq_set_params(struct ieee80211_local *local)
> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
>  {
>         struct ieee80211_local *local = hw_to_local(hw);
> +       struct rb_node *node = local->schedule_pos[ac];
>         struct txq_info *txqi = NULL;
> +       bool first = false;
>
>         lockdep_assert_held(&local->active_txq_lock[ac]);
>
> - begin:
> -       txqi = list_first_entry_or_null(&local->active_txqs[ac],
> -                                       struct txq_info,
> -                                       schedule_order);
> -       if (!txqi)
> +       if (!node) {
> +               node = rb_first_cached(&local->active_txqs[ac]);
> +               first = true;
> +       } else
> +               node = rb_next(node);
> +
> +       if (!node)
>                 return NULL;
>
> +       txqi = container_of(node, struct txq_info, schedule_order);
> +
>         if (txqi->txq.sta) {
>                 struct sta_info *sta = container_of(txqi->txq.sta,
>                                                 struct sta_info, sta);
>
> -               if (sta->airtime[txqi->txq.ac].deficit < 0) {
> -                       sta->airtime[txqi->txq.ac].deficit +=
> -                               sta->airtime_weight;
> -                       list_move_tail(&txqi->schedule_order,
> -                                      &local->active_txqs[txqi->txq.ac]);
> -                       goto begin;
> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
> +                       if (first)
> +                               local->airtime_v_t[ac] = sta->airtime[ac].v_t;
> +                       else
> +                               return NULL;
>                 }
>         }
>
>
> -       if (txqi->schedule_round == local->schedule_round[ac])
> -               return NULL;
> -
> -       list_del_init(&txqi->schedule_order);
> -       txqi->schedule_round = local->schedule_round[ac];
> +       local->schedule_pos[ac] = node;
>         return &txqi->txq;
>  }
>  EXPORT_SYMBOL(ieee80211_next_txq);
>
> -void ieee80211_return_txq(struct ieee80211_hw *hw,
> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
> +                                  struct txq_info *txqi, u8 ac)
> +{
> +       struct rb_node **new = &root->rb_root.rb_node;
> +       struct rb_node *parent = NULL;
> +       struct txq_info *__txqi;
> +       bool leftmost = true;
> +
> +       while (*new) {
> +               parent = *new;
> +               __txqi = rb_entry(parent, struct txq_info, schedule_order);
> +
> +               if (!txqi->txq.sta) {
> +                       /* new txqi has no sta - insert to the left */
> +                       new = &parent->rb_left;
> +               } else if (!__txqi->txq.sta) {
> +                       /* existing txqi has no sta - insert to the right */
> +                       new = &parent->rb_right;
> +                       leftmost = false;
> +               } else {
> +                       struct sta_info *old_sta = container_of(__txqi->txq.sta,
> +                                                               struct sta_info,
> +                                                               sta);
> +                       struct sta_info *new_sta = container_of(txqi->txq.sta,
> +                                                               struct sta_info,
> +                                                               sta);
> +
> +                       if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
> +                               new = &parent->rb_left;
> +                       else {
> +                               new = &parent->rb_right;
> +                               leftmost = false;
> +                       }
> +
> +               }
> +       }
> +
> +       rb_link_node(&txqi->schedule_order, parent, new);
> +       rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
> +}
> +
> +void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> +                           struct ieee80211_txq *txq)
> +       __acquires(txq_lock) __releases(txq_lock)
> +{
> +       struct ieee80211_local *local = hw_to_local(hw);
> +       struct txq_info *txqi = to_txq_info(txq);
> +       u8 ac = txq->ac;
> +
> +       spin_lock_bh(&local->active_txq_lock[ac]);
> +
> +       if (!RB_EMPTY_NODE(&txqi->schedule_order))
> +               goto out;
> +
> +       if (txq->sta) {
> +               struct sta_info *sta = container_of(txq->sta,
> +                                                   struct sta_info, sta);
> +
> +               local->airtime_weight_sum[ac] += sta->airtime_weight;
> +               if (local->airtime_v_t[ac] > AIRTIME_GRACE)
> +                       sta->airtime[ac].v_t = max(local->airtime_v_t[ac] - AIRTIME_GRACE,
> +                                                  sta->airtime[ac].v_t);
> +       }
> +
> +       __ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +
> + out:
> +       spin_unlock_bh(&local->active_txq_lock[ac]);
> +}
> +EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_resort_txq(struct ieee80211_hw *hw,
>                           struct ieee80211_txq *txq)
>  {
>         struct ieee80211_local *local = hw_to_local(hw);
>         struct txq_info *txqi = to_txq_info(txq);
> +       u8 ac = txq->ac;
> +
> +       if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
> +               rb_erase_cached(&txqi->schedule_order,
> +                               &local->active_txqs[ac]);
> +               RB_CLEAR_NODE(&txqi->schedule_order);
> +               __ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +       }
> +}
> +
> +static void __ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +                                      struct ieee80211_txq *txq)
> +{
> +       struct ieee80211_local *local = hw_to_local(hw);
> +       struct txq_info *txqi = to_txq_info(txq);
> +       u8 ac = txq->ac;
>
>         lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>
> -       if (list_empty(&txqi->schedule_order) &&
> -           (!skb_queue_empty(&txqi->frags) || txqi->tin.backlog_packets)) {
> -               /* If airtime accounting is active, always enqueue STAs at the
> -                * head of the list to ensure that they only get moved to the
> -                * back by the airtime DRR scheduler once they have a negative
> -                * deficit. A station that already has a negative deficit will
> -                * get immediately moved to the back of the list on the next
> -                * call to ieee80211_next_txq().
> -                */
> -               if (txqi->txq.sta &&
> -                   wiphy_ext_feature_isset(local->hw.wiphy,
> -                                           NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
> -                       list_add(&txqi->schedule_order,
> -                                &local->active_txqs[txq->ac]);
> -               else
> -                       list_add_tail(&txqi->schedule_order,
> -                                     &local->active_txqs[txq->ac]);
> +       if (RB_EMPTY_NODE(&txqi->schedule_order))
> +               return;
> +
> +       if (txq->sta) {
> +               struct sta_info *sta = container_of(txq->sta,
> +                                                   struct sta_info, sta);
> +
> +               local->airtime_weight_sum[ac] -= sta->airtime_weight;
>         }
> +
> +       rb_erase_cached(&txqi->schedule_order,
> +                       &local->active_txqs[txq->ac]);
> +       RB_CLEAR_NODE(&txqi->schedule_order);
>  }
> -EXPORT_SYMBOL(ieee80211_return_txq);
>
> -void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> -                           struct ieee80211_txq *txq)
> +void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +                             struct ieee80211_txq *txq)
>         __acquires(txq_lock) __releases(txq_lock)
>  {
>         struct ieee80211_local *local = hw_to_local(hw);
>
>         spin_lock_bh(&local->active_txq_lock[txq->ac]);
> -       ieee80211_return_txq(hw, txq);
> +       __ieee80211_unschedule_txq(hw, txq);
>         spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>  }
> -EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_return_txq(struct ieee80211_hw *hw,
> +                         struct ieee80211_txq *txq)
> +{
> +       struct ieee80211_local *local = hw_to_local(hw);
> +       struct txq_info *txqi = to_txq_info(txq);
> +
> +       lockdep_assert_held(&local->active_txq_lock[txq->ac]);
> +
> +       if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
> +           (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
> +               __ieee80211_unschedule_txq(hw, txq);
> +}
> +EXPORT_SYMBOL(ieee80211_return_txq);
>
>  bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
>                                 struct ieee80211_txq *txq)
>  {
>         struct ieee80211_local *local = hw_to_local(hw);
> -       struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
> +       struct txq_info *txqi = to_txq_info(txq);
>         struct sta_info *sta;
>         u8 ac = txq->ac;
>
>         lockdep_assert_held(&local->active_txq_lock[ac]);
>
>         if (!txqi->txq.sta)
> -               goto out;
> -
> -       if (list_empty(&txqi->schedule_order))
> -               goto out;
> -
> -       list_for_each_entry_safe(iter, tmp, &local->active_txqs[ac],
> -                                schedule_order) {
> -               if (iter == txqi)
> -                       break;
> -
> -               if (!iter->txq.sta) {
> -                       list_move_tail(&iter->schedule_order,
> -                                      &local->active_txqs[ac]);
> -                       continue;
> -               }
> -               sta = container_of(iter->txq.sta, struct sta_info, sta);
> -               if (sta->airtime[ac].deficit < 0)
> -                       sta->airtime[ac].deficit += sta->airtime_weight;
> -               list_move_tail(&iter->schedule_order, &local->active_txqs[ac]);
> -       }
> +               return true;
>
>         sta = container_of(txqi->txq.sta, struct sta_info, sta);
> -       if (sta->airtime[ac].deficit >= 0)
> -               goto out;
> -
> -       sta->airtime[ac].deficit += sta->airtime_weight;
> -       list_move_tail(&txqi->schedule_order, &local->active_txqs[ac]);
> -
> -       return false;
> -out:
> -       if (!list_empty(&txqi->schedule_order))
> -               list_del_init(&txqi->schedule_order);
> -
> -       return true;
> +       return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_may_transmit);
>
> @@ -3769,7 +3832,6 @@ void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac)
>         struct ieee80211_local *local = hw_to_local(hw);
>
>         spin_lock_bh(&local->active_txq_lock[ac]);
> -       local->schedule_round[ac]++;
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_start);
>
> @@ -3778,6 +3840,7 @@ void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
>  {
>         struct ieee80211_local *local = hw_to_local(hw);
>
> +       local->schedule_pos[ac] = NULL;
>         spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_end);
> --
> 2.20.1
>
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
Toke Høiland-Jørgensen March 5, 2019, 3:45 p.m. UTC | #2
Toke Høiland-Jørgensen <toke@redhat.com> writes:

> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
>
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
>
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.
>
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
>
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
>
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> This is basically the idea I mentioned earlier for a different way to
> handle the airtime scheduling.
>
> I've tested it on ath9k, where it achieves the same fairness and
> weighing properties as the old scheduler. It would be good if you could
> test it on your ath10k setup, Rajkumar; and all of you please comment on
> whether you agree that this is better from an API point of view.

So no one has any comments on this? :)

-Toke
Rajkumar Manoharan March 6, 2019, 11:09 p.m. UTC | #3
On 2019-03-05 07:45, Toke Høiland-Jørgensen wrote:
> Toke Høiland-Jørgensen <toke@redhat.com> writes:
> 
>> This switches the airtime scheduler in mac80211 to use a virtual 
>> time-based
>> scheduler instead of the round-robin scheduler used before. This has a
>> couple of advantages:
>> 
>> - No need to sync up the round-robin scheduler in firmware/hardware 
>> with
>>   the round-robin airtime scheduler.
>> 
>> - If several stations are eligible for transmission we can schedule 
>> both of
>>   them; no need to hard-block the scheduling rotation until the head 
>> of the
>>   queue has used up its quantum.
>> 
>> - The check of whether a station is eligible for transmission becomes
>>   simpler (in ieee80211_txq_may_transmit()).
>> 
>> The drawback is that scheduling becomes slightly more expensive, as we 
>> need
>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>> ieee80211_register_airtime() becomes O(logN) in the number of 
>> currently
>> scheduled TXQs. However, hopefully this number rarely grows too big 
>> (it's
>> only TXQs currently backlogged, not all associated stations), so it
>> shouldn't be too big of an issue.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>> This is basically the idea I mentioned earlier for a different way to
>> handle the airtime scheduling.
>> 
>> I've tested it on ath9k, where it achieves the same fairness and
>> weighing properties as the old scheduler. It would be good if you 
>> could
>> test it on your ath10k setup, Rajkumar; and all of you please comment 
>> on
>> whether you agree that this is better from an API point of view.
> 
> So no one has any comments on this? :)
> 
Toke,

This is kind of design change. ;) FMU w.r.t ath10k, earlier deficit 
adjustment
and list rotation happens at next_txq and may_transmit. Now it seems the 
rbtree
adjustment happens upon new txq insertion through wake_txq and whenever 
driver
reports airtime by register_airtime. Am I right?

We are using pretty old kernel (3.14, 4.4). It definitely needs backport 
of rbtree.
Have you used *Wrt image or validation on x86?

-Rajkumar
Toke Høiland-Jørgensen March 7, 2019, 9:46 a.m. UTC | #4
Rajkumar Manoharan <rmanohar@codeaurora.org> writes:

> On 2019-03-05 07:45, Toke Høiland-Jørgensen wrote:
>> Toke Høiland-Jørgensen <toke@redhat.com> writes:
>> 
>>> This switches the airtime scheduler in mac80211 to use a virtual 
>>> time-based
>>> scheduler instead of the round-robin scheduler used before. This has a
>>> couple of advantages:
>>> 
>>> - No need to sync up the round-robin scheduler in firmware/hardware 
>>> with
>>>   the round-robin airtime scheduler.
>>> 
>>> - If several stations are eligible for transmission we can schedule 
>>> both of
>>>   them; no need to hard-block the scheduling rotation until the head 
>>> of the
>>>   queue has used up its quantum.
>>> 
>>> - The check of whether a station is eligible for transmission becomes
>>>   simpler (in ieee80211_txq_may_transmit()).
>>> 
>>> The drawback is that scheduling becomes slightly more expensive, as we 
>>> need
>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>> ieee80211_register_airtime() becomes O(logN) in the number of 
>>> currently
>>> scheduled TXQs. However, hopefully this number rarely grows too big 
>>> (it's
>>> only TXQs currently backlogged, not all associated stations), so it
>>> shouldn't be too big of an issue.
>>> 
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> ---
>>> This is basically the idea I mentioned earlier for a different way to
>>> handle the airtime scheduling.
>>> 
>>> I've tested it on ath9k, where it achieves the same fairness and
>>> weighing properties as the old scheduler. It would be good if you 
>>> could
>>> test it on your ath10k setup, Rajkumar; and all of you please comment 
>>> on
>>> whether you agree that this is better from an API point of view.
>> 
>> So no one has any comments on this? :)
>> 
> Toke,
>
> This is kind of design change. ;)

Yup, that was kinda the point ;)

> FMU w.r.t ath10k, earlier deficit adjustment and list rotation happens
> at next_txq and may_transmit. Now it seems the rbtree adjustment
> happens upon new txq insertion through wake_txq and whenever driver
> reports airtime by register_airtime. Am I right?

Yes. This change was, in part, motivated by the discussions we had
around your patches to ath10k, and the need to make the round-robin
schedulers sync up. I consider that a bit of a hack which this approach
avoids.

> We are using pretty old kernel (3.14, 4.4). It definitely needs backport 
> of rbtree.

Well I wasn't necessarily planning for this to be backported. It doesn't
change the driver API, and *in theory the aggregate fairness behaviour
should be the same between the two scheduling approaches (famous last
words). Verifying that this is actually the case is why I'm asking for
testing.

Also, isn't rbtree pretty old? A git blame on rbtree.c suggests it goes
all the way back to the 2.6 era...

> Have you used *Wrt image or validation on x86?

I have only tested this on x86 with a mac80211-next-based kernel...

-Toke
Felix Fietkau March 7, 2019, 2:27 p.m. UTC | #5
On 2019-02-15 18:05, Toke Høiland-Jørgensen wrote:
> This switches the airtime scheduler in mac80211 to use a virtual time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
> 
> - No need to sync up the round-robin scheduler in firmware/hardware with
>   the round-robin airtime scheduler.
> 
> - If several stations are eligible for transmission we can schedule both of
>   them; no need to hard-block the scheduling rotation until the head of the
>   queue has used up its quantum.
> 
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
> 
> The drawback is that scheduling becomes slightly more expensive, as we need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
The approach looks good to me, but I haven't really reviewed it very
carefully yet. Just some points that I noticed below:

> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
> index 11f058987a54..9d01fdd86e2d 100644
> --- a/net/mac80211/sta_info.c
> +++ b/net/mac80211/sta_info.c
> @@ -389,7 +389,6 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
>  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>  		skb_queue_head_init(&sta->ps_tx_buf[i]);
>  		skb_queue_head_init(&sta->tx_filtered[i]);
> -		sta->airtime[i].deficit = sta->airtime_weight;
>  	}
>  
>  	for (i = 0; i < IEEE80211_NUM_TIDS; i++)
> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
>  {
>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>  	struct ieee80211_local *local = sta->sdata->local;
> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>  	u8 ac = ieee80211_ac_from_tid(tid);
> -	u32 airtime = 0;
> +	u64 airtime = 0, weight_sum;
> +
> +	if (!txq)
> +		return;
>  
>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>  		airtime += tx_airtime;
>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>  		airtime += rx_airtime;
>  
> +	/* Weights scale so the unit weight is 256 */
> +	airtime <<= 8;
> +
>  	spin_lock_bh(&local->active_txq_lock[ac]);
> +
>  	sta->airtime[ac].tx_airtime += tx_airtime;
>  	sta->airtime[ac].rx_airtime += rx_airtime;
> -	sta->airtime[ac].deficit -= airtime;
> +
> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
> +
> +	local->airtime_v_t[ac] += airtime / weight_sum;
> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
> +	ieee80211_resort_txq(&local->hw, txq);
These divisions could be a bit expensive, any way to change the
calculation to avoid them?

> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> -void ieee80211_return_txq(struct ieee80211_hw *hw,
> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
> +				   struct txq_info *txqi, u8 ac)
> +{
> +	struct rb_node **new = &root->rb_root.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct txq_info *__txqi;
> +	bool leftmost = true;
> +
> +	while (*new) {
> +		parent = *new;
> +		__txqi = rb_entry(parent, struct txq_info, schedule_order);
> +
> +		if (!txqi->txq.sta) {
> +			/* new txqi has no sta - insert to the left */
> +			new = &parent->rb_left;
> +		} else if (!__txqi->txq.sta) {
> +			/* existing txqi has no sta - insert to the right */
> +			new = &parent->rb_right;
> +			leftmost = false;
> +		} else {
> +			struct sta_info *old_sta = container_of(__txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +			struct sta_info *new_sta = container_of(txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +
> +			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
> +				new = &parent->rb_left;
> +			else {
> +				new = &parent->rb_right;
> +				leftmost = false;
> +			}
> +
> +		}
> +	}
> +
> +	rb_link_node(&txqi->schedule_order, parent, new);
> +	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
> +}
I'm a bit worried about this part. Does that mean that vif txqs always
have priority over sta txqs?

- Felix
Toke Høiland-Jørgensen March 8, 2019, 11:05 a.m. UTC | #6
Felix Fietkau <nbd@nbd.name> writes:

> On 2019-02-15 18:05, Toke Høiland-Jørgensen wrote:
>> This switches the airtime scheduler in mac80211 to use a virtual time-based
>> scheduler instead of the round-robin scheduler used before. This has a
>> couple of advantages:
>> 
>> - No need to sync up the round-robin scheduler in firmware/hardware with
>>   the round-robin airtime scheduler.
>> 
>> - If several stations are eligible for transmission we can schedule both of
>>   them; no need to hard-block the scheduling rotation until the head of the
>>   queue has used up its quantum.
>> 
>> - The check of whether a station is eligible for transmission becomes
>>   simpler (in ieee80211_txq_may_transmit()).
>> 
>> The drawback is that scheduling becomes slightly more expensive, as we need
>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>> ieee80211_register_airtime() becomes O(logN) in the number of currently
>> scheduled TXQs. However, hopefully this number rarely grows too big (it's
>> only TXQs currently backlogged, not all associated stations), so it
>> shouldn't be too big of an issue.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> The approach looks good to me, but I haven't really reviewed it very
> carefully yet. Just some points that I noticed below:

Cool!

>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>> index 11f058987a54..9d01fdd86e2d 100644
>> --- a/net/mac80211/sta_info.c
>> +++ b/net/mac80211/sta_info.c
>> @@ -389,7 +389,6 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
>>  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>>  		skb_queue_head_init(&sta->ps_tx_buf[i]);
>>  		skb_queue_head_init(&sta->tx_filtered[i]);
>> -		sta->airtime[i].deficit = sta->airtime_weight;
>>  	}
>>  
>>  	for (i = 0; i < IEEE80211_NUM_TIDS; i++)
>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
>>  {
>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>  	struct ieee80211_local *local = sta->sdata->local;
>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>  	u8 ac = ieee80211_ac_from_tid(tid);
>> -	u32 airtime = 0;
>> +	u64 airtime = 0, weight_sum;
>> +
>> +	if (!txq)
>> +		return;
>>  
>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>  		airtime += tx_airtime;
>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>  		airtime += rx_airtime;
>>  
>> +	/* Weights scale so the unit weight is 256 */
>> +	airtime <<= 8;
>> +
>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>> +
>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>> -	sta->airtime[ac].deficit -= airtime;
>> +
>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>> +
>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
>> +	ieee80211_resort_txq(&local->hw, txq);
> These divisions could be a bit expensive, any way to change the
> calculation to avoid them?

Yeah, given that the denominators are constant from the PoV of the fast
path, we can pre-compute reciprocals and turn these divides into
multiplications. Will incorporate that...

>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> -void ieee80211_return_txq(struct ieee80211_hw *hw,
>> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
>> +				   struct txq_info *txqi, u8 ac)
>> +{
>> +	struct rb_node **new = &root->rb_root.rb_node;
>> +	struct rb_node *parent = NULL;
>> +	struct txq_info *__txqi;
>> +	bool leftmost = true;
>> +
>> +	while (*new) {
>> +		parent = *new;
>> +		__txqi = rb_entry(parent, struct txq_info, schedule_order);
>> +
>> +		if (!txqi->txq.sta) {
>> +			/* new txqi has no sta - insert to the left */
>> +			new = &parent->rb_left;
>> +		} else if (!__txqi->txq.sta) {
>> +			/* existing txqi has no sta - insert to the right */
>> +			new = &parent->rb_right;
>> +			leftmost = false;
>> +		} else {
>> +			struct sta_info *old_sta = container_of(__txqi->txq.sta,
>> +								struct sta_info,
>> +								sta);
>> +			struct sta_info *new_sta = container_of(txqi->txq.sta,
>> +								struct sta_info,
>> +								sta);
>> +
>> +			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
>> +				new = &parent->rb_left;
>> +			else {
>> +				new = &parent->rb_right;
>> +				leftmost = false;
>> +			}
>> +
>> +		}
>> +	}
>> +
>> +	rb_link_node(&txqi->schedule_order, parent, new);
>> +	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
>> +}
> I'm a bit worried about this part. Does that mean that vif txqs always
> have priority over sta txqs?

Yeah, it does. This sort of mirrors what the existing airtime scheduler
does (because VIFs don't have an airtime deficit), but because it's a
round-robin scheduler the effect is less severe as long as there are
stations able to transmit.

I guess the obvious fix is to start accounting airtime usage for the VIF
as well? We may want to do that in any case, as that would also give
users a convenient way to set policy for multicast traffic. Any
objections to this?

-Toke
Felix Fietkau March 8, 2019, 6:16 p.m. UTC | #7
On 2019-03-08 12:05, Toke Høiland-Jørgensen wrote:
> Felix Fietkau <nbd@nbd.name> writes:
> 
>> On 2019-02-15 18:05, Toke Høiland-Jørgensen wrote:
>>> This switches the airtime scheduler in mac80211 to use a virtual time-based
>>> scheduler instead of the round-robin scheduler used before. This has a
>>> couple of advantages:
>>> 
>>> - No need to sync up the round-robin scheduler in firmware/hardware with
>>>   the round-robin airtime scheduler.
>>> 
>>> - If several stations are eligible for transmission we can schedule both of
>>>   them; no need to hard-block the scheduling rotation until the head of the
>>>   queue has used up its quantum.
>>> 
>>> - The check of whether a station is eligible for transmission becomes
>>>   simpler (in ieee80211_txq_may_transmit()).
>>> 
>>> The drawback is that scheduling becomes slightly more expensive, as we need
>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>> ieee80211_register_airtime() becomes O(logN) in the number of currently
>>> scheduled TXQs. However, hopefully this number rarely grows too big (it's
>>> only TXQs currently backlogged, not all associated stations), so it
>>> shouldn't be too big of an issue.
>>> 
>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> The approach looks good to me, but I haven't really reviewed it very
>> carefully yet. Just some points that I noticed below:
> 
> Cool!
> 
>>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>>> index 11f058987a54..9d01fdd86e2d 100644
>>> --- a/net/mac80211/sta_info.c
>>> +++ b/net/mac80211/sta_info.c
>>> @@ -389,7 +389,6 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
>>>  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>>>  		skb_queue_head_init(&sta->ps_tx_buf[i]);
>>>  		skb_queue_head_init(&sta->tx_filtered[i]);
>>> -		sta->airtime[i].deficit = sta->airtime_weight;
>>>  	}
>>>  
>>>  	for (i = 0; i < IEEE80211_NUM_TIDS; i++)
>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
>>>  {
>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>>  	struct ieee80211_local *local = sta->sdata->local;
>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>> -	u32 airtime = 0;
>>> +	u64 airtime = 0, weight_sum;
>>> +
>>> +	if (!txq)
>>> +		return;
>>>  
>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>  		airtime += tx_airtime;
>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>  		airtime += rx_airtime;
>>>  
>>> +	/* Weights scale so the unit weight is 256 */
>>> +	airtime <<= 8;
>>> +
>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>> +
>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>> -	sta->airtime[ac].deficit -= airtime;
>>> +
>>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>>> +
>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
>>> +	ieee80211_resort_txq(&local->hw, txq);
>> These divisions could be a bit expensive, any way to change the
>> calculation to avoid them?
> 
> Yeah, given that the denominators are constant from the PoV of the fast
> path, we can pre-compute reciprocals and turn these divides into
> multiplications. Will incorporate that...
Sounds good.

>> I'm a bit worried about this part. Does that mean that vif txqs always
>> have priority over sta txqs?
> 
> Yeah, it does. This sort of mirrors what the existing airtime scheduler
> does (because VIFs don't have an airtime deficit), but because it's a
> round-robin scheduler the effect is less severe as long as there are
> stations able to transmit.
> 
> I guess the obvious fix is to start accounting airtime usage for the VIF
> as well? We may want to do that in any case, as that would also give
> users a convenient way to set policy for multicast traffic. Any
> objections to this?
I think this is a good idea.

- Felix
Toke Høiland-Jørgensen March 8, 2019, 7:06 p.m. UTC | #8
Felix Fietkau <nbd@nbd.name> writes:

> On 2019-03-08 12:05, Toke Høiland-Jørgensen wrote:
>> Felix Fietkau <nbd@nbd.name> writes:
>> 
>>> On 2019-02-15 18:05, Toke Høiland-Jørgensen wrote:
>>>> This switches the airtime scheduler in mac80211 to use a virtual time-based
>>>> scheduler instead of the round-robin scheduler used before. This has a
>>>> couple of advantages:
>>>> 
>>>> - No need to sync up the round-robin scheduler in firmware/hardware with
>>>>   the round-robin airtime scheduler.
>>>> 
>>>> - If several stations are eligible for transmission we can schedule both of
>>>>   them; no need to hard-block the scheduling rotation until the head of the
>>>>   queue has used up its quantum.
>>>> 
>>>> - The check of whether a station is eligible for transmission becomes
>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>> 
>>>> The drawback is that scheduling becomes slightly more expensive, as we need
>>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>>> ieee80211_register_airtime() becomes O(logN) in the number of currently
>>>> scheduled TXQs. However, hopefully this number rarely grows too big (it's
>>>> only TXQs currently backlogged, not all associated stations), so it
>>>> shouldn't be too big of an issue.
>>>> 
>>>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>>> The approach looks good to me, but I haven't really reviewed it very
>>> carefully yet. Just some points that I noticed below:
>> 
>> Cool!
>> 
>>>> diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
>>>> index 11f058987a54..9d01fdd86e2d 100644
>>>> --- a/net/mac80211/sta_info.c
>>>> +++ b/net/mac80211/sta_info.c
>>>> @@ -389,7 +389,6 @@ struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
>>>>  	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
>>>>  		skb_queue_head_init(&sta->ps_tx_buf[i]);
>>>>  		skb_queue_head_init(&sta->tx_filtered[i]);
>>>> -		sta->airtime[i].deficit = sta->airtime_weight;
>>>>  	}
>>>>  
>>>>  	for (i = 0; i < IEEE80211_NUM_TIDS; i++)
>>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
>>>>  {
>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>> -	u32 airtime = 0;
>>>> +	u64 airtime = 0, weight_sum;
>>>> +
>>>> +	if (!txq)
>>>> +		return;
>>>>  
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>  		airtime += tx_airtime;
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>  		airtime += rx_airtime;
>>>>  
>>>> +	/* Weights scale so the unit weight is 256 */
>>>> +	airtime <<= 8;
>>>> +
>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>> +
>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>> -	sta->airtime[ac].deficit -= airtime;
>>>> +
>>>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>>>> +
>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
>>>> +	ieee80211_resort_txq(&local->hw, txq);
>>> These divisions could be a bit expensive, any way to change the
>>> calculation to avoid them?
>> 
>> Yeah, given that the denominators are constant from the PoV of the fast
>> path, we can pre-compute reciprocals and turn these divides into
>> multiplications. Will incorporate that...
> Sounds good.
>
>>> I'm a bit worried about this part. Does that mean that vif txqs always
>>> have priority over sta txqs?
>> 
>> Yeah, it does. This sort of mirrors what the existing airtime scheduler
>> does (because VIFs don't have an airtime deficit), but because it's a
>> round-robin scheduler the effect is less severe as long as there are
>> stations able to transmit.
>> 
>> I guess the obvious fix is to start accounting airtime usage for the VIF
>> as well? We may want to do that in any case, as that would also give
>> users a convenient way to set policy for multicast traffic. Any
>> objections to this?
> I think this is a good idea.

Great, thanks! I'll do a separate patch for accounting airtime for the
VIF queue, and then respin this on top. Will probably be a little while
before I get around to it, though.

-Toke
Yibo Zhao April 4, 2019, 4:41 a.m. UTC | #9
On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
> This switches the airtime scheduler in mac80211 to use a virtual 
> time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
> 
> - No need to sync up the round-robin scheduler in firmware/hardware 
> with
>   the round-robin airtime scheduler.
> 
> - If several stations are eligible for transmission we can schedule 
> both of
>   them; no need to hard-block the scheduling rotation until the head of 
> the
>   queue has used up its quantum.
> 
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
> 
> The drawback is that scheduling becomes slightly more expensive, as we 
> need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big 
> (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
> 
> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
> ---
> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
> ieee80211_sta *pubsta, u8 tid,
>  {
>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>  	struct ieee80211_local *local = sta->sdata->local;
> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>  	u8 ac = ieee80211_ac_from_tid(tid);
> -	u32 airtime = 0;
> +	u64 airtime = 0, weight_sum;
> +
> +	if (!txq)
> +		return;
> 
>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>  		airtime += tx_airtime;
>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>  		airtime += rx_airtime;
> 
> +	/* Weights scale so the unit weight is 256 */
> +	airtime <<= 8;
> +
>  	spin_lock_bh(&local->active_txq_lock[ac]);
> +
>  	sta->airtime[ac].tx_airtime += tx_airtime;
>  	sta->airtime[ac].rx_airtime += rx_airtime;
> -	sta->airtime[ac].deficit -= airtime;
> +
> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
> +
> +	local->airtime_v_t[ac] += airtime / weight_sum;
Hi Toke,

it looks like local->airtime_v_t acts like a Tx criteria. Only the 
stations with less airtime than that are valid for Tx. That means there 
are situations, like 50 clients, that some of the stations can be used 
to Tx when putting next_txq in the loop. Am I right?

> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;

Another question, any plan for taking v_t overflow situation into 
consideration? u64 might be enough for low throughput products but not 
sure for high end products. Something like below:
     u64 prev = sta->airtime[ac].v_t;

     if ()

> +	ieee80211_resort_txq(&local->hw, txq);
> +
>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_sta_register_airtime);
> diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
> index 05647d835894..4b901a2a998e 100644
> --- a/net/mac80211/sta_info.h
> +++ b/net/mac80211/sta_info.h
> @@ -130,11 +130,12 @@ enum ieee80211_agg_stop_reason {
>  /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler 
> */
>  #define AIRTIME_USE_TX		BIT(0)
>  #define AIRTIME_USE_RX		BIT(1)
> +#define AIRTIME_GRACE 500 /* usec of grace period before reset */
> 
>  struct airtime_info {
>  	u64 rx_airtime;
>  	u64 tx_airtime;
> -	s64 deficit;
> +	u64 v_t;
>  };
> 
>  struct sta_info;
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 61c7ea9de2cc..8e125df8ade0 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1449,7 +1449,7 @@ void ieee80211_txq_init(struct
> ieee80211_sub_if_data *sdata,
>  	codel_vars_init(&txqi->def_cvars);
>  	codel_stats_init(&txqi->cstats);
>  	__skb_queue_head_init(&txqi->frags);
> -	INIT_LIST_HEAD(&txqi->schedule_order);
> +	RB_CLEAR_NODE(&txqi->schedule_order);
> 
>  	txqi->txq.vif = &sdata->vif;
> 
> @@ -1493,9 +1493,7 @@ void ieee80211_txq_purge(struct ieee80211_local 
> *local,
>  	ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
>  	spin_unlock_bh(&fq->lock);
> 
> -	spin_lock_bh(&local->active_txq_lock[txqi->txq.ac]);
> -	list_del_init(&txqi->schedule_order);
> -	spin_unlock_bh(&local->active_txq_lock[txqi->txq.ac]);
> +	ieee80211_unschedule_txq(&local->hw, &txqi->txq);
>  }
> 
>  void ieee80211_txq_set_params(struct ieee80211_local *local)
> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 
> ac)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> +	struct rb_node *node = local->schedule_pos[ac];
>  	struct txq_info *txqi = NULL;
> +	bool first = false;
> 
>  	lockdep_assert_held(&local->active_txq_lock[ac]);
> 
> - begin:
> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
> -					struct txq_info,
> -					schedule_order);
> -	if (!txqi)
> +	if (!node) {
> +		node = rb_first_cached(&local->active_txqs[ac]);
> +		first = true;
> +	} else
> +		node = rb_next(node);
> +
> +	if (!node)
>  		return NULL;
> 
> +	txqi = container_of(node, struct txq_info, schedule_order);
> +
>  	if (txqi->txq.sta) {
>  		struct sta_info *sta = container_of(txqi->txq.sta,
>  						struct sta_info, sta);
> 
> -		if (sta->airtime[txqi->txq.ac].deficit < 0) {
> -			sta->airtime[txqi->txq.ac].deficit +=
> -				sta->airtime_weight;
> -			list_move_tail(&txqi->schedule_order,
> -				       &local->active_txqs[txqi->txq.ac]);
> -			goto begin;
> +		if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
> +			if (first)
> +				local->airtime_v_t[ac] = sta->airtime[ac].v_t;
> +			else
> +				return NULL;
>  		}
>  	}
> 
> 
> -	if (txqi->schedule_round == local->schedule_round[ac])
> -		return NULL;
> -
> -	list_del_init(&txqi->schedule_order);
> -	txqi->schedule_round = local->schedule_round[ac];
> +	local->schedule_pos[ac] = node;
>  	return &txqi->txq;
>  }
>  EXPORT_SYMBOL(ieee80211_next_txq);
> 
> -void ieee80211_return_txq(struct ieee80211_hw *hw,
> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
> +				   struct txq_info *txqi, u8 ac)
> +{
> +	struct rb_node **new = &root->rb_root.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct txq_info *__txqi;
> +	bool leftmost = true;
> +
> +	while (*new) {
> +		parent = *new;
> +		__txqi = rb_entry(parent, struct txq_info, schedule_order);
> +
> +		if (!txqi->txq.sta) {
> +			/* new txqi has no sta - insert to the left */
> +			new = &parent->rb_left;
> +		} else if (!__txqi->txq.sta) {
> +			/* existing txqi has no sta - insert to the right */
> +			new = &parent->rb_right;
> +			leftmost = false;
> +		} else {
> +			struct sta_info *old_sta = container_of(__txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +			struct sta_info *new_sta = container_of(txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +
> +			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
> +				new = &parent->rb_left;
> +			else {
> +				new = &parent->rb_right;
> +				leftmost = false;
> +			}
> +
> +		}
> +	}
> +
> +	rb_link_node(&txqi->schedule_order, parent, new);
> +	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
> +}
> +
> +void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> +			    struct ieee80211_txq *txq)
> +	__acquires(txq_lock) __releases(txq_lock)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> +
> +	spin_lock_bh(&local->active_txq_lock[ac]);
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order))
> +		goto out;
> +
> +	if (txq->sta) {
> +		struct sta_info *sta = container_of(txq->sta,
> +						    struct sta_info, sta);
> +
> +		local->airtime_weight_sum[ac] += sta->airtime_weight;
> +		if (local->airtime_v_t[ac] > AIRTIME_GRACE)
> +			sta->airtime[ac].v_t = max(local->airtime_v_t[ac] - AIRTIME_GRACE,
> +						   sta->airtime[ac].v_t);
> +	}
> +
> +	__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +
> + out:
> +	spin_unlock_bh(&local->active_txq_lock[ac]);
> +}
> +EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_resort_txq(struct ieee80211_hw *hw,
>  			  struct ieee80211_txq *txq)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
>  	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
> +		rb_erase_cached(&txqi->schedule_order,
> +				&local->active_txqs[ac]);
> +		RB_CLEAR_NODE(&txqi->schedule_order);
> +		__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +	}
> +}
> +
> +static void __ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +				       struct ieee80211_txq *txq)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> 
>  	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
> 
> -	if (list_empty(&txqi->schedule_order) &&
> -	    (!skb_queue_empty(&txqi->frags) || txqi->tin.backlog_packets)) {
> -		/* If airtime accounting is active, always enqueue STAs at the
> -		 * head of the list to ensure that they only get moved to the
> -		 * back by the airtime DRR scheduler once they have a negative
> -		 * deficit. A station that already has a negative deficit will
> -		 * get immediately moved to the back of the list on the next
> -		 * call to ieee80211_next_txq().
> -		 */
> -		if (txqi->txq.sta &&
> -		    wiphy_ext_feature_isset(local->hw.wiphy,
> -					    NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
> -			list_add(&txqi->schedule_order,
> -				 &local->active_txqs[txq->ac]);
> -		else
> -			list_add_tail(&txqi->schedule_order,
> -				      &local->active_txqs[txq->ac]);
> +	if (RB_EMPTY_NODE(&txqi->schedule_order))
> +		return;
> +
> +	if (txq->sta) {
> +		struct sta_info *sta = container_of(txq->sta,
> +						    struct sta_info, sta);
> +
> +		local->airtime_weight_sum[ac] -= sta->airtime_weight;
>  	}
> +
> +	rb_erase_cached(&txqi->schedule_order,
> +			&local->active_txqs[txq->ac]);
> +	RB_CLEAR_NODE(&txqi->schedule_order);
>  }
> -EXPORT_SYMBOL(ieee80211_return_txq);
> 
> -void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> -			    struct ieee80211_txq *txq)
> +void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +			      struct ieee80211_txq *txq)
>  	__acquires(txq_lock) __releases(txq_lock)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
>  	spin_lock_bh(&local->active_txq_lock[txq->ac]);
> -	ieee80211_return_txq(hw, txq);
> +	__ieee80211_unschedule_txq(hw, txq);
>  	spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>  }
> -EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_return_txq(struct ieee80211_hw *hw,
> +			  struct ieee80211_txq *txq)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +
> +	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
> +	    (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
> +		__ieee80211_unschedule_txq(hw, txq);
> +}
> +EXPORT_SYMBOL(ieee80211_return_txq);
> 
>  bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
>  				struct ieee80211_txq *txq)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> -	struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
> +	struct txq_info *txqi = to_txq_info(txq);
>  	struct sta_info *sta;
>  	u8 ac = txq->ac;
> 
>  	lockdep_assert_held(&local->active_txq_lock[ac]);
> 
>  	if (!txqi->txq.sta)
> -		goto out;
> -
> -	if (list_empty(&txqi->schedule_order))
> -		goto out;
> -
> -	list_for_each_entry_safe(iter, tmp, &local->active_txqs[ac],
> -				 schedule_order) {
> -		if (iter == txqi)
> -			break;
> -
> -		if (!iter->txq.sta) {
> -			list_move_tail(&iter->schedule_order,
> -				       &local->active_txqs[ac]);
> -			continue;
> -		}
> -		sta = container_of(iter->txq.sta, struct sta_info, sta);
> -		if (sta->airtime[ac].deficit < 0)
> -			sta->airtime[ac].deficit += sta->airtime_weight;
> -		list_move_tail(&iter->schedule_order, &local->active_txqs[ac]);
> -	}
> +		return true;
> 
>  	sta = container_of(txqi->txq.sta, struct sta_info, sta);
> -	if (sta->airtime[ac].deficit >= 0)
> -		goto out;
> -
> -	sta->airtime[ac].deficit += sta->airtime_weight;
> -	list_move_tail(&txqi->schedule_order, &local->active_txqs[ac]);
> -
> -	return false;
> -out:
> -	if (!list_empty(&txqi->schedule_order))
> -		list_del_init(&txqi->schedule_order);
> -
> -	return true;
> +	return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_may_transmit);
> 
> @@ -3769,7 +3832,6 @@ void ieee80211_txq_schedule_start(struct
> ieee80211_hw *hw, u8 ac)
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
>  	spin_lock_bh(&local->active_txq_lock[ac]);
> -	local->schedule_round[ac]++;
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_start);
> 
> @@ -3778,6 +3840,7 @@ void ieee80211_txq_schedule_end(struct
> ieee80211_hw *hw, u8 ac)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
> +	local->schedule_pos[ac] = NULL;
>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_end);
Yibo Zhao April 4, 2019, 4:43 a.m. UTC | #10
On 2019-04-04 12:41, Yibo Zhao wrote:
> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>> This switches the airtime scheduler in mac80211 to use a virtual 
>> time-based
>> scheduler instead of the round-robin scheduler used before. This has a
>> couple of advantages:
>> 
>> - No need to sync up the round-robin scheduler in firmware/hardware 
>> with
>>   the round-robin airtime scheduler.
>> 
>> - If several stations are eligible for transmission we can schedule 
>> both of
>>   them; no need to hard-block the scheduling rotation until the head 
>> of the
>>   queue has used up its quantum.
>> 
>> - The check of whether a station is eligible for transmission becomes
>>   simpler (in ieee80211_txq_may_transmit()).
>> 
>> The drawback is that scheduling becomes slightly more expensive, as we 
>> need
>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>> ieee80211_register_airtime() becomes O(logN) in the number of 
>> currently
>> scheduled TXQs. However, hopefully this number rarely grows too big 
>> (it's
>> only TXQs currently backlogged, not all associated stations), so it
>> shouldn't be too big of an issue.
>> 
>> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com>
>> ---
>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>> ieee80211_sta *pubsta, u8 tid,
>>  {
>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>  	struct ieee80211_local *local = sta->sdata->local;
>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>  	u8 ac = ieee80211_ac_from_tid(tid);
>> -	u32 airtime = 0;
>> +	u64 airtime = 0, weight_sum;
>> +
>> +	if (!txq)
>> +		return;
>> 
>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>  		airtime += tx_airtime;
>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>  		airtime += rx_airtime;
>> 
>> +	/* Weights scale so the unit weight is 256 */
>> +	airtime <<= 8;
>> +
>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>> +
>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>> -	sta->airtime[ac].deficit -= airtime;
>> +
>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>> +
>> +	local->airtime_v_t[ac] += airtime / weight_sum;
> Hi Toke,
> 
> it looks like local->airtime_v_t acts like a Tx criteria. Only the
> stations with less airtime than that are valid for Tx. That means
> there are situations, like 50 clients, that some of the stations can
> be used to Tx when putting next_txq in the loop. Am I right?
> 
>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
> 
Another question, any plan for taking v_t overflow situation into
consideration? u64 might be enough for low throughput products but not
sure for high end products. Something like below:
u64 prev = sta->airtime[ac].v_t;

      if ()

>> +	ieee80211_resort_txq(&local->hw, txq);
>> +
>>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>>  }
>>  EXPORT_SYMBOL(ieee80211_sta_register_airtime);
>> diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
>> index 05647d835894..4b901a2a998e 100644
>> --- a/net/mac80211/sta_info.h
>> +++ b/net/mac80211/sta_info.h
>> @@ -130,11 +130,12 @@ enum ieee80211_agg_stop_reason {
>>  /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler 
>> */
>>  #define AIRTIME_USE_TX		BIT(0)
>>  #define AIRTIME_USE_RX		BIT(1)
>> +#define AIRTIME_GRACE 500 /* usec of grace period before reset */
>> 
>>  struct airtime_info {
>>  	u64 rx_airtime;
>>  	u64 tx_airtime;
>> -	s64 deficit;
>> +	u64 v_t;
>>  };
>> 
>>  struct sta_info;
>> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
>> index 61c7ea9de2cc..8e125df8ade0 100644
>> --- a/net/mac80211/tx.c
>> +++ b/net/mac80211/tx.c
>> @@ -1449,7 +1449,7 @@ void ieee80211_txq_init(struct
>> ieee80211_sub_if_data *sdata,
>>  	codel_vars_init(&txqi->def_cvars);
>>  	codel_stats_init(&txqi->cstats);
>>  	__skb_queue_head_init(&txqi->frags);
>> -	INIT_LIST_HEAD(&txqi->schedule_order);
>> +	RB_CLEAR_NODE(&txqi->schedule_order);
>> 
>>  	txqi->txq.vif = &sdata->vif;
>> 
>> @@ -1493,9 +1493,7 @@ void ieee80211_txq_purge(struct ieee80211_local 
>> *local,
>>  	ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
>>  	spin_unlock_bh(&fq->lock);
>> 
>> -	spin_lock_bh(&local->active_txq_lock[txqi->txq.ac]);
>> -	list_del_init(&txqi->schedule_order);
>> -	spin_unlock_bh(&local->active_txq_lock[txqi->txq.ac]);
>> +	ieee80211_unschedule_txq(&local->hw, &txqi->txq);
>>  }
>> 
>>  void ieee80211_txq_set_params(struct ieee80211_local *local)
>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 
>> ac)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct rb_node *node = local->schedule_pos[ac];
>>  	struct txq_info *txqi = NULL;
>> +	bool first = false;
>> 
>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>> 
>> - begin:
>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>> -					struct txq_info,
>> -					schedule_order);
>> -	if (!txqi)
>> +	if (!node) {
>> +		node = rb_first_cached(&local->active_txqs[ac]);
>> +		first = true;
>> +	} else
>> +		node = rb_next(node);
>> +
>> +	if (!node)
>>  		return NULL;
>> 
>> +	txqi = container_of(node, struct txq_info, schedule_order);
>> +
>>  	if (txqi->txq.sta) {
>>  		struct sta_info *sta = container_of(txqi->txq.sta,
>>  						struct sta_info, sta);
>> 
>> -		if (sta->airtime[txqi->txq.ac].deficit < 0) {
>> -			sta->airtime[txqi->txq.ac].deficit +=
>> -				sta->airtime_weight;
>> -			list_move_tail(&txqi->schedule_order,
>> -				       &local->active_txqs[txqi->txq.ac]);
>> -			goto begin;
>> +		if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
>> +			if (first)
>> +				local->airtime_v_t[ac] = sta->airtime[ac].v_t;
>> +			else
>> +				return NULL;
>>  		}
>>  	}
>> 
>> 
>> -	if (txqi->schedule_round == local->schedule_round[ac])
>> -		return NULL;
>> -
>> -	list_del_init(&txqi->schedule_order);
>> -	txqi->schedule_round = local->schedule_round[ac];
>> +	local->schedule_pos[ac] = node;
>>  	return &txqi->txq;
>>  }
>>  EXPORT_SYMBOL(ieee80211_next_txq);
>> 
>> -void ieee80211_return_txq(struct ieee80211_hw *hw,
>> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
>> +				   struct txq_info *txqi, u8 ac)
>> +{
>> +	struct rb_node **new = &root->rb_root.rb_node;
>> +	struct rb_node *parent = NULL;
>> +	struct txq_info *__txqi;
>> +	bool leftmost = true;
>> +
>> +	while (*new) {
>> +		parent = *new;
>> +		__txqi = rb_entry(parent, struct txq_info, schedule_order);
>> +
>> +		if (!txqi->txq.sta) {
>> +			/* new txqi has no sta - insert to the left */
>> +			new = &parent->rb_left;
>> +		} else if (!__txqi->txq.sta) {
>> +			/* existing txqi has no sta - insert to the right */
>> +			new = &parent->rb_right;
>> +			leftmost = false;
>> +		} else {
>> +			struct sta_info *old_sta = container_of(__txqi->txq.sta,
>> +								struct sta_info,
>> +								sta);
>> +			struct sta_info *new_sta = container_of(txqi->txq.sta,
>> +								struct sta_info,
>> +								sta);
>> +
>> +			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
>> +				new = &parent->rb_left;
>> +			else {
>> +				new = &parent->rb_right;
>> +				leftmost = false;
>> +			}
>> +
>> +		}
>> +	}
>> +
>> +	rb_link_node(&txqi->schedule_order, parent, new);
>> +	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
>> +}
>> +
>> +void ieee80211_schedule_txq(struct ieee80211_hw *hw,
>> +			    struct ieee80211_txq *txq)
>> +	__acquires(txq_lock) __releases(txq_lock)
>> +{
>> +	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct txq_info *txqi = to_txq_info(txq);
>> +	u8 ac = txq->ac;
>> +
>> +	spin_lock_bh(&local->active_txq_lock[ac]);
>> +
>> +	if (!RB_EMPTY_NODE(&txqi->schedule_order))
>> +		goto out;
>> +
>> +	if (txq->sta) {
>> +		struct sta_info *sta = container_of(txq->sta,
>> +						    struct sta_info, sta);
>> +
>> +		local->airtime_weight_sum[ac] += sta->airtime_weight;
>> +		if (local->airtime_v_t[ac] > AIRTIME_GRACE)
>> +			sta->airtime[ac].v_t = max(local->airtime_v_t[ac] - AIRTIME_GRACE,
>> +						   sta->airtime[ac].v_t);
>> +	}
>> +
>> +	__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
>> +
>> + out:
>> +	spin_unlock_bh(&local->active_txq_lock[ac]);
>> +}
>> +EXPORT_SYMBOL(ieee80211_schedule_txq);
>> +
>> +void ieee80211_resort_txq(struct ieee80211_hw *hw,
>>  			  struct ieee80211_txq *txq)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>>  	struct txq_info *txqi = to_txq_info(txq);
>> +	u8 ac = txq->ac;
>> +
>> +	if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
>> +		rb_erase_cached(&txqi->schedule_order,
>> +				&local->active_txqs[ac]);
>> +		RB_CLEAR_NODE(&txqi->schedule_order);
>> +		__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
>> +	}
>> +}
>> +
>> +static void __ieee80211_unschedule_txq(struct ieee80211_hw *hw,
>> +				       struct ieee80211_txq *txq)
>> +{
>> +	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct txq_info *txqi = to_txq_info(txq);
>> +	u8 ac = txq->ac;
>> 
>>  	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>> 
>> -	if (list_empty(&txqi->schedule_order) &&
>> -	    (!skb_queue_empty(&txqi->frags) || txqi->tin.backlog_packets)) {
>> -		/* If airtime accounting is active, always enqueue STAs at the
>> -		 * head of the list to ensure that they only get moved to the
>> -		 * back by the airtime DRR scheduler once they have a negative
>> -		 * deficit. A station that already has a negative deficit will
>> -		 * get immediately moved to the back of the list on the next
>> -		 * call to ieee80211_next_txq().
>> -		 */
>> -		if (txqi->txq.sta &&
>> -		    wiphy_ext_feature_isset(local->hw.wiphy,
>> -					    NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
>> -			list_add(&txqi->schedule_order,
>> -				 &local->active_txqs[txq->ac]);
>> -		else
>> -			list_add_tail(&txqi->schedule_order,
>> -				      &local->active_txqs[txq->ac]);
>> +	if (RB_EMPTY_NODE(&txqi->schedule_order))
>> +		return;
>> +
>> +	if (txq->sta) {
>> +		struct sta_info *sta = container_of(txq->sta,
>> +						    struct sta_info, sta);
>> +
>> +		local->airtime_weight_sum[ac] -= sta->airtime_weight;
>>  	}
>> +
>> +	rb_erase_cached(&txqi->schedule_order,
>> +			&local->active_txqs[txq->ac]);
>> +	RB_CLEAR_NODE(&txqi->schedule_order);
>>  }
>> -EXPORT_SYMBOL(ieee80211_return_txq);
>> 
>> -void ieee80211_schedule_txq(struct ieee80211_hw *hw,
>> -			    struct ieee80211_txq *txq)
>> +void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
>> +			      struct ieee80211_txq *txq)
>>  	__acquires(txq_lock) __releases(txq_lock)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> 
>>  	spin_lock_bh(&local->active_txq_lock[txq->ac]);
>> -	ieee80211_return_txq(hw, txq);
>> +	__ieee80211_unschedule_txq(hw, txq);
>>  	spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>>  }
>> -EXPORT_SYMBOL(ieee80211_schedule_txq);
>> +
>> +void ieee80211_return_txq(struct ieee80211_hw *hw,
>> +			  struct ieee80211_txq *txq)
>> +{
>> +	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct txq_info *txqi = to_txq_info(txq);
>> +
>> +	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
>> +
>> +	if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
>> +	    (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
>> +		__ieee80211_unschedule_txq(hw, txq);
>> +}
>> +EXPORT_SYMBOL(ieee80211_return_txq);
>> 
>>  bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
>>  				struct ieee80211_txq *txq)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> -	struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
>> +	struct txq_info *txqi = to_txq_info(txq);
>>  	struct sta_info *sta;
>>  	u8 ac = txq->ac;
>> 
>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>> 
>>  	if (!txqi->txq.sta)
>> -		goto out;
>> -
>> -	if (list_empty(&txqi->schedule_order))
>> -		goto out;
>> -
>> -	list_for_each_entry_safe(iter, tmp, &local->active_txqs[ac],
>> -				 schedule_order) {
>> -		if (iter == txqi)
>> -			break;
>> -
>> -		if (!iter->txq.sta) {
>> -			list_move_tail(&iter->schedule_order,
>> -				       &local->active_txqs[ac]);
>> -			continue;
>> -		}
>> -		sta = container_of(iter->txq.sta, struct sta_info, sta);
>> -		if (sta->airtime[ac].deficit < 0)
>> -			sta->airtime[ac].deficit += sta->airtime_weight;
>> -		list_move_tail(&iter->schedule_order, &local->active_txqs[ac]);
>> -	}
>> +		return true;
>> 
>>  	sta = container_of(txqi->txq.sta, struct sta_info, sta);
>> -	if (sta->airtime[ac].deficit >= 0)
>> -		goto out;
>> -
>> -	sta->airtime[ac].deficit += sta->airtime_weight;
>> -	list_move_tail(&txqi->schedule_order, &local->active_txqs[ac]);
>> -
>> -	return false;
>> -out:
>> -	if (!list_empty(&txqi->schedule_order))
>> -		list_del_init(&txqi->schedule_order);
>> -
>> -	return true;
>> +	return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
>>  }
>>  EXPORT_SYMBOL(ieee80211_txq_may_transmit);
>> 
>> @@ -3769,7 +3832,6 @@ void ieee80211_txq_schedule_start(struct
>> ieee80211_hw *hw, u8 ac)
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> 
>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>> -	local->schedule_round[ac]++;
>>  }
>>  EXPORT_SYMBOL(ieee80211_txq_schedule_start);
>> 
>> @@ -3778,6 +3840,7 @@ void ieee80211_txq_schedule_end(struct
>> ieee80211_hw *hw, u8 ac)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> 
>> +	local->schedule_pos[ac] = NULL;
>>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>>  }
>>  EXPORT_SYMBOL(ieee80211_txq_schedule_end);
Yibo Zhao April 4, 2019, 5 a.m. UTC | #11
On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
> This switches the airtime scheduler in mac80211 to use a virtual 
> time-based
> scheduler instead of the round-robin scheduler used before. This has a
> couple of advantages:
> 
> - No need to sync up the round-robin scheduler in firmware/hardware 
> with
>   the round-robin airtime scheduler.
> 
> - If several stations are eligible for transmission we can schedule 
> both of
>   them; no need to hard-block the scheduling rotation until the head of 
> the
>   queue has used up its quantum.
> 
> - The check of whether a station is eligible for transmission becomes
>   simpler (in ieee80211_txq_may_transmit()).
> 
> The drawback is that scheduling becomes slightly more expensive, as we 
> need
> to maintain an rbtree of TXQs sorted by virtual time. This means that
> ieee80211_register_airtime() becomes O(logN) in the number of currently
> scheduled TXQs. However, hopefully this number rarely grows too big 
> (it's
> only TXQs currently backlogged, not all associated stations), so it
> shouldn't be too big of an issue.
> 
> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
> ieee80211_sta *pubsta, u8 tid,
>  {
>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>  	struct ieee80211_local *local = sta->sdata->local;
> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>  	u8 ac = ieee80211_ac_from_tid(tid);
> -	u32 airtime = 0;
> +	u64 airtime = 0, weight_sum;
> +
> +	if (!txq)
> +		return;
> 
>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>  		airtime += tx_airtime;
>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>  		airtime += rx_airtime;
> 
> +	/* Weights scale so the unit weight is 256 */
> +	airtime <<= 8;
> +
>  	spin_lock_bh(&local->active_txq_lock[ac]);
> +
>  	sta->airtime[ac].tx_airtime += tx_airtime;
>  	sta->airtime[ac].rx_airtime += rx_airtime;
> -	sta->airtime[ac].deficit -= airtime;
> +
> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
> +
> +	local->airtime_v_t[ac] += airtime / weight_sum;
Hi Toke,

Please ignore the previous two broken emails regarding this new proposal 
from me.

It looks like local->airtime_v_t acts like a Tx criteria. Only the 
stations with less airtime than that are valid for Tx. That means there 
are situations, like 50 clients, that some of the stations can be used 
to Tx when putting next_txq in the loop. Am I right?

> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
Another question. Any plan for taking v_t overflow situation into 
consideration? u64 might be enough for low throughput products but not 
sure for high end products. Something like below for reference:

     u64 prev = sta->airtime[ac].v_t;
     u64 current = prev + airtime_v_t;

      if (current < prev)
          iterate every rb node to substact the rb_first's vt


> +	ieee80211_resort_txq(&local->hw, txq);
> +
>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_sta_register_airtime);
> diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
> index 05647d835894..4b901a2a998e 100644
> --- a/net/mac80211/sta_info.h
> +++ b/net/mac80211/sta_info.h
> @@ -130,11 +130,12 @@ enum ieee80211_agg_stop_reason {
>  /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler 
> */
>  #define AIRTIME_USE_TX		BIT(0)
>  #define AIRTIME_USE_RX		BIT(1)
> +#define AIRTIME_GRACE 500 /* usec of grace period before reset */
> 
>  struct airtime_info {
>  	u64 rx_airtime;
>  	u64 tx_airtime;
> -	s64 deficit;
> +	u64 v_t;
>  };
> 
>  struct sta_info;
> diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
> index 61c7ea9de2cc..8e125df8ade0 100644
> --- a/net/mac80211/tx.c
> +++ b/net/mac80211/tx.c
> @@ -1449,7 +1449,7 @@ void ieee80211_txq_init(struct
> ieee80211_sub_if_data *sdata,
>  	codel_vars_init(&txqi->def_cvars);
>  	codel_stats_init(&txqi->cstats);
>  	__skb_queue_head_init(&txqi->frags);
> -	INIT_LIST_HEAD(&txqi->schedule_order);
> +	RB_CLEAR_NODE(&txqi->schedule_order);
> 
>  	txqi->txq.vif = &sdata->vif;
> 
> @@ -1493,9 +1493,7 @@ void ieee80211_txq_purge(struct ieee80211_local 
> *local,
>  	ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
>  	spin_unlock_bh(&fq->lock);
> 
> -	spin_lock_bh(&local->active_txq_lock[txqi->txq.ac]);
> -	list_del_init(&txqi->schedule_order);
> -	spin_unlock_bh(&local->active_txq_lock[txqi->txq.ac]);
> +	ieee80211_unschedule_txq(&local->hw, &txqi->txq);
>  }
> 
>  void ieee80211_txq_set_params(struct ieee80211_local *local)
> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 
> ac)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> +	struct rb_node *node = local->schedule_pos[ac];
>  	struct txq_info *txqi = NULL;
> +	bool first = false;
> 
>  	lockdep_assert_held(&local->active_txq_lock[ac]);
> 
> - begin:
> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
> -					struct txq_info,
> -					schedule_order);
> -	if (!txqi)
> +	if (!node) {
> +		node = rb_first_cached(&local->active_txqs[ac]);
> +		first = true;
> +	} else
> +		node = rb_next(node);
> +
> +	if (!node)
>  		return NULL;
> 
> +	txqi = container_of(node, struct txq_info, schedule_order);
> +
>  	if (txqi->txq.sta) {
>  		struct sta_info *sta = container_of(txqi->txq.sta,
>  						struct sta_info, sta);
> 
> -		if (sta->airtime[txqi->txq.ac].deficit < 0) {
> -			sta->airtime[txqi->txq.ac].deficit +=
> -				sta->airtime_weight;
> -			list_move_tail(&txqi->schedule_order,
> -				       &local->active_txqs[txqi->txq.ac]);
> -			goto begin;
> +		if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
> +			if (first)
> +				local->airtime_v_t[ac] = sta->airtime[ac].v_t;
> +			else
> +				return NULL;
>  		}
>  	}
> 
> 
> -	if (txqi->schedule_round == local->schedule_round[ac])
> -		return NULL;
> -
> -	list_del_init(&txqi->schedule_order);
> -	txqi->schedule_round = local->schedule_round[ac];
> +	local->schedule_pos[ac] = node;
>  	return &txqi->txq;
>  }
>  EXPORT_SYMBOL(ieee80211_next_txq);
> 
> -void ieee80211_return_txq(struct ieee80211_hw *hw,
> +static void __ieee80211_insert_txq(struct rb_root_cached *root,
> +				   struct txq_info *txqi, u8 ac)
> +{
> +	struct rb_node **new = &root->rb_root.rb_node;
> +	struct rb_node *parent = NULL;
> +	struct txq_info *__txqi;
> +	bool leftmost = true;
> +
> +	while (*new) {
> +		parent = *new;
> +		__txqi = rb_entry(parent, struct txq_info, schedule_order);
> +
> +		if (!txqi->txq.sta) {
> +			/* new txqi has no sta - insert to the left */
> +			new = &parent->rb_left;
> +		} else if (!__txqi->txq.sta) {
> +			/* existing txqi has no sta - insert to the right */
> +			new = &parent->rb_right;
> +			leftmost = false;
> +		} else {
> +			struct sta_info *old_sta = container_of(__txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +			struct sta_info *new_sta = container_of(txqi->txq.sta,
> +								struct sta_info,
> +								sta);
> +
> +			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
> +				new = &parent->rb_left;
> +			else {
> +				new = &parent->rb_right;
> +				leftmost = false;
> +			}
> +
> +		}
> +	}
> +
> +	rb_link_node(&txqi->schedule_order, parent, new);
> +	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
> +}
> +
> +void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> +			    struct ieee80211_txq *txq)
> +	__acquires(txq_lock) __releases(txq_lock)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> +
> +	spin_lock_bh(&local->active_txq_lock[ac]);
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order))
> +		goto out;
> +
> +	if (txq->sta) {
> +		struct sta_info *sta = container_of(txq->sta,
> +						    struct sta_info, sta);
> +
> +		local->airtime_weight_sum[ac] += sta->airtime_weight;
> +		if (local->airtime_v_t[ac] > AIRTIME_GRACE)
> +			sta->airtime[ac].v_t = max(local->airtime_v_t[ac] - AIRTIME_GRACE,
> +						   sta->airtime[ac].v_t);
> +	}
> +
> +	__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +
> + out:
> +	spin_unlock_bh(&local->active_txq_lock[ac]);
> +}
> +EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_resort_txq(struct ieee80211_hw *hw,
>  			  struct ieee80211_txq *txq)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
>  	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
> +		rb_erase_cached(&txqi->schedule_order,
> +				&local->active_txqs[ac]);
> +		RB_CLEAR_NODE(&txqi->schedule_order);
> +		__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
> +	}
> +}
> +
> +static void __ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +				       struct ieee80211_txq *txq)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +	u8 ac = txq->ac;
> 
>  	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
> 
> -	if (list_empty(&txqi->schedule_order) &&
> -	    (!skb_queue_empty(&txqi->frags) || txqi->tin.backlog_packets)) {
> -		/* If airtime accounting is active, always enqueue STAs at the
> -		 * head of the list to ensure that they only get moved to the
> -		 * back by the airtime DRR scheduler once they have a negative
> -		 * deficit. A station that already has a negative deficit will
> -		 * get immediately moved to the back of the list on the next
> -		 * call to ieee80211_next_txq().
> -		 */
> -		if (txqi->txq.sta &&
> -		    wiphy_ext_feature_isset(local->hw.wiphy,
> -					    NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
> -			list_add(&txqi->schedule_order,
> -				 &local->active_txqs[txq->ac]);
> -		else
> -			list_add_tail(&txqi->schedule_order,
> -				      &local->active_txqs[txq->ac]);
> +	if (RB_EMPTY_NODE(&txqi->schedule_order))
> +		return;
> +
> +	if (txq->sta) {
> +		struct sta_info *sta = container_of(txq->sta,
> +						    struct sta_info, sta);
> +
> +		local->airtime_weight_sum[ac] -= sta->airtime_weight;
>  	}
> +
> +	rb_erase_cached(&txqi->schedule_order,
> +			&local->active_txqs[txq->ac]);
> +	RB_CLEAR_NODE(&txqi->schedule_order);
>  }
> -EXPORT_SYMBOL(ieee80211_return_txq);
> 
> -void ieee80211_schedule_txq(struct ieee80211_hw *hw,
> -			    struct ieee80211_txq *txq)
> +void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
> +			      struct ieee80211_txq *txq)
>  	__acquires(txq_lock) __releases(txq_lock)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
>  	spin_lock_bh(&local->active_txq_lock[txq->ac]);
> -	ieee80211_return_txq(hw, txq);
> +	__ieee80211_unschedule_txq(hw, txq);
>  	spin_unlock_bh(&local->active_txq_lock[txq->ac]);
>  }
> -EXPORT_SYMBOL(ieee80211_schedule_txq);
> +
> +void ieee80211_return_txq(struct ieee80211_hw *hw,
> +			  struct ieee80211_txq *txq)
> +{
> +	struct ieee80211_local *local = hw_to_local(hw);
> +	struct txq_info *txqi = to_txq_info(txq);
> +
> +	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
> +
> +	if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
> +	    (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
> +		__ieee80211_unschedule_txq(hw, txq);
> +}
> +EXPORT_SYMBOL(ieee80211_return_txq);
> 
>  bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
>  				struct ieee80211_txq *txq)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> -	struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
> +	struct txq_info *txqi = to_txq_info(txq);
>  	struct sta_info *sta;
>  	u8 ac = txq->ac;
> 
>  	lockdep_assert_held(&local->active_txq_lock[ac]);
> 
>  	if (!txqi->txq.sta)
> -		goto out;
> -
> -	if (list_empty(&txqi->schedule_order))
> -		goto out;
> -
> -	list_for_each_entry_safe(iter, tmp, &local->active_txqs[ac],
> -				 schedule_order) {
> -		if (iter == txqi)
> -			break;
> -
> -		if (!iter->txq.sta) {
> -			list_move_tail(&iter->schedule_order,
> -				       &local->active_txqs[ac]);
> -			continue;
> -		}
> -		sta = container_of(iter->txq.sta, struct sta_info, sta);
> -		if (sta->airtime[ac].deficit < 0)
> -			sta->airtime[ac].deficit += sta->airtime_weight;
> -		list_move_tail(&iter->schedule_order, &local->active_txqs[ac]);
> -	}
> +		return true;
> 
>  	sta = container_of(txqi->txq.sta, struct sta_info, sta);
> -	if (sta->airtime[ac].deficit >= 0)
> -		goto out;
> -
> -	sta->airtime[ac].deficit += sta->airtime_weight;
> -	list_move_tail(&txqi->schedule_order, &local->active_txqs[ac]);
> -
> -	return false;
> -out:
> -	if (!list_empty(&txqi->schedule_order))
> -		list_del_init(&txqi->schedule_order);
> -
> -	return true;
> +	return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_may_transmit);
> 
> @@ -3769,7 +3832,6 @@ void ieee80211_txq_schedule_start(struct
> ieee80211_hw *hw, u8 ac)
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
>  	spin_lock_bh(&local->active_txq_lock[ac]);
> -	local->schedule_round[ac]++;
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_start);
> 
> @@ -3778,6 +3840,7 @@ void ieee80211_txq_schedule_end(struct
> ieee80211_hw *hw, u8 ac)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> 
> +	local->schedule_pos[ac] = NULL;
>  	spin_unlock_bh(&local->active_txq_lock[ac]);
>  }
>  EXPORT_SYMBOL(ieee80211_txq_schedule_end);
Toke Høiland-Jørgensen April 4, 2019, 8:31 a.m. UTC | #12
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>> This switches the airtime scheduler in mac80211 to use a virtual 
>> time-based
>> scheduler instead of the round-robin scheduler used before. This has a
>> couple of advantages:
>> 
>> - No need to sync up the round-robin scheduler in firmware/hardware 
>> with
>>   the round-robin airtime scheduler.
>> 
>> - If several stations are eligible for transmission we can schedule 
>> both of
>>   them; no need to hard-block the scheduling rotation until the head of 
>> the
>>   queue has used up its quantum.
>> 
>> - The check of whether a station is eligible for transmission becomes
>>   simpler (in ieee80211_txq_may_transmit()).
>> 
>> The drawback is that scheduling becomes slightly more expensive, as we 
>> need
>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>> ieee80211_register_airtime() becomes O(logN) in the number of currently
>> scheduled TXQs. However, hopefully this number rarely grows too big 
>> (it's
>> only TXQs currently backlogged, not all associated stations), so it
>> shouldn't be too big of an issue.
>> 
>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>> ieee80211_sta *pubsta, u8 tid,
>>  {
>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>  	struct ieee80211_local *local = sta->sdata->local;
>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>  	u8 ac = ieee80211_ac_from_tid(tid);
>> -	u32 airtime = 0;
>> +	u64 airtime = 0, weight_sum;
>> +
>> +	if (!txq)
>> +		return;
>> 
>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>  		airtime += tx_airtime;
>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>  		airtime += rx_airtime;
>> 
>> +	/* Weights scale so the unit weight is 256 */
>> +	airtime <<= 8;
>> +
>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>> +
>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>> -	sta->airtime[ac].deficit -= airtime;
>> +
>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>> +
>> +	local->airtime_v_t[ac] += airtime / weight_sum;
> Hi Toke,
>
> Please ignore the previous two broken emails regarding this new proposal 
> from me.
>
> It looks like local->airtime_v_t acts like a Tx criteria. Only the 
> stations with less airtime than that are valid for Tx. That means there 
> are situations, like 50 clients, that some of the stations can be used 
> to Tx when putting next_txq in the loop. Am I right?

I'm not sure what you mean here. Are you referring to the case where new
stations appear with a very low (zero) airtime_v_t? That is handled when
the station is enqueued.

>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
> Another question. Any plan for taking v_t overflow situation into 
> consideration? u64 might be enough for low throughput products but not 
> sure for high end products. Something like below for reference:

The unit for the variable is time, not bytes, so it is unaffected by
throughput. 2**64 microseconds is 584554 *years* according to my
'units' binary, so don't think we have to worry too much about this
overflowing ;)

-Toke
Dave Taht April 4, 2019, 8:36 a.m. UTC | #13
On Thu, Apr 4, 2019 at 10:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Yibo Zhao <yiboz@codeaurora.org> writes:
>
> > On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
> >> This switches the airtime scheduler in mac80211 to use a virtual
> >> time-based
> >> scheduler instead of the round-robin scheduler used before. This has a
> >> couple of advantages:
> >>
> >> - No need to sync up the round-robin scheduler in firmware/hardware
> >> with
> >>   the round-robin airtime scheduler.
> >>
> >> - If several stations are eligible for transmission we can schedule
> >> both of
> >>   them; no need to hard-block the scheduling rotation until the head of
> >> the
> >>   queue has used up its quantum.
> >>
> >> - The check of whether a station is eligible for transmission becomes
> >>   simpler (in ieee80211_txq_may_transmit()).
> >>
> >> The drawback is that scheduling becomes slightly more expensive, as we
> >> need
> >> to maintain an rbtree of TXQs sorted by virtual time. This means that
> >> ieee80211_register_airtime() becomes O(logN) in the number of currently
> >> scheduled TXQs. However, hopefully this number rarely grows too big
> >> (it's
> >> only TXQs currently backlogged, not all associated stations), so it
> >> shouldn't be too big of an issue.
> >>
> >> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
> >> ieee80211_sta *pubsta, u8 tid,
> >>  {
> >>      struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
> >>      struct ieee80211_local *local = sta->sdata->local;
> >> +    struct ieee80211_txq *txq = sta->sta.txq[tid];
> >>      u8 ac = ieee80211_ac_from_tid(tid);
> >> -    u32 airtime = 0;
> >> +    u64 airtime = 0, weight_sum;
> >> +
> >> +    if (!txq)
> >> +            return;
> >>
> >>      if (sta->local->airtime_flags & AIRTIME_USE_TX)
> >>              airtime += tx_airtime;
> >>      if (sta->local->airtime_flags & AIRTIME_USE_RX)
> >>              airtime += rx_airtime;
> >>
> >> +    /* Weights scale so the unit weight is 256 */
> >> +    airtime <<= 8;
> >> +
> >>      spin_lock_bh(&local->active_txq_lock[ac]);
> >> +
> >>      sta->airtime[ac].tx_airtime += tx_airtime;
> >>      sta->airtime[ac].rx_airtime += rx_airtime;
> >> -    sta->airtime[ac].deficit -= airtime;
> >> +
> >> +    weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
> >> +
> >> +    local->airtime_v_t[ac] += airtime / weight_sum;
> > Hi Toke,
> >
> > Please ignore the previous two broken emails regarding this new proposal
> > from me.
> >
> > It looks like local->airtime_v_t acts like a Tx criteria. Only the
> > stations with less airtime than that are valid for Tx. That means there
> > are situations, like 50 clients, that some of the stations can be used
> > to Tx when putting next_txq in the loop. Am I right?
>
> I'm not sure what you mean here. Are you referring to the case where new
> stations appear with a very low (zero) airtime_v_t? That is handled when
> the station is enqueued.
>
> >> +    sta->airtime[ac].v_t += airtime / sta->airtime_weight;
> > Another question. Any plan for taking v_t overflow situation into
> > consideration? u64 might be enough for low throughput products but not
> > sure for high end products. Something like below for reference:
>
> The unit for the variable is time, not bytes, so it is unaffected by
> throughput. 2**64 microseconds is 584554 *years* according to my
> 'units' binary, so don't think we have to worry too much about this
> overflowing ;)

I tend to think more in terms in ns than us. Is this metric in us currently?

I figure having stuff that at least works correctly within the solar
system is a good start, and getting coverage to 250 light years
is sufficiently forward looking: http://www.atlasoftheuniverse.com/250lys.html

>
> -Toke
> _______________________________________________
> Make-wifi-fast mailing list
> Make-wifi-fast@lists.bufferbloat.net
> https://lists.bufferbloat.net/listinfo/make-wifi-fast
Toke Høiland-Jørgensen April 4, 2019, 8:50 a.m. UTC | #14
Dave Taht <dave.taht@gmail.com> writes:

> On Thu, Apr 4, 2019 at 10:31 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>
>> > On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>> >> This switches the airtime scheduler in mac80211 to use a virtual
>> >> time-based
>> >> scheduler instead of the round-robin scheduler used before. This has a
>> >> couple of advantages:
>> >>
>> >> - No need to sync up the round-robin scheduler in firmware/hardware
>> >> with
>> >>   the round-robin airtime scheduler.
>> >>
>> >> - If several stations are eligible for transmission we can schedule
>> >> both of
>> >>   them; no need to hard-block the scheduling rotation until the head of
>> >> the
>> >>   queue has used up its quantum.
>> >>
>> >> - The check of whether a station is eligible for transmission becomes
>> >>   simpler (in ieee80211_txq_may_transmit()).
>> >>
>> >> The drawback is that scheduling becomes slightly more expensive, as we
>> >> need
>> >> to maintain an rbtree of TXQs sorted by virtual time. This means that
>> >> ieee80211_register_airtime() becomes O(logN) in the number of currently
>> >> scheduled TXQs. However, hopefully this number rarely grows too big
>> >> (it's
>> >> only TXQs currently backlogged, not all associated stations), so it
>> >> shouldn't be too big of an issue.
>> >>
>> >> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>> >> ieee80211_sta *pubsta, u8 tid,
>> >>  {
>> >>      struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>> >>      struct ieee80211_local *local = sta->sdata->local;
>> >> +    struct ieee80211_txq *txq = sta->sta.txq[tid];
>> >>      u8 ac = ieee80211_ac_from_tid(tid);
>> >> -    u32 airtime = 0;
>> >> +    u64 airtime = 0, weight_sum;
>> >> +
>> >> +    if (!txq)
>> >> +            return;
>> >>
>> >>      if (sta->local->airtime_flags & AIRTIME_USE_TX)
>> >>              airtime += tx_airtime;
>> >>      if (sta->local->airtime_flags & AIRTIME_USE_RX)
>> >>              airtime += rx_airtime;
>> >>
>> >> +    /* Weights scale so the unit weight is 256 */
>> >> +    airtime <<= 8;
>> >> +
>> >>      spin_lock_bh(&local->active_txq_lock[ac]);
>> >> +
>> >>      sta->airtime[ac].tx_airtime += tx_airtime;
>> >>      sta->airtime[ac].rx_airtime += rx_airtime;
>> >> -    sta->airtime[ac].deficit -= airtime;
>> >> +
>> >> +    weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>> >> +
>> >> +    local->airtime_v_t[ac] += airtime / weight_sum;
>> > Hi Toke,
>> >
>> > Please ignore the previous two broken emails regarding this new proposal
>> > from me.
>> >
>> > It looks like local->airtime_v_t acts like a Tx criteria. Only the
>> > stations with less airtime than that are valid for Tx. That means there
>> > are situations, like 50 clients, that some of the stations can be used
>> > to Tx when putting next_txq in the loop. Am I right?
>>
>> I'm not sure what you mean here. Are you referring to the case where new
>> stations appear with a very low (zero) airtime_v_t? That is handled when
>> the station is enqueued.
>>
>> >> +    sta->airtime[ac].v_t += airtime / sta->airtime_weight;
>> > Another question. Any plan for taking v_t overflow situation into
>> > consideration? u64 might be enough for low throughput products but not
>> > sure for high end products. Something like below for reference:
>>
>> The unit for the variable is time, not bytes, so it is unaffected by
>> throughput. 2**64 microseconds is 584554 *years* according to my
>> 'units' binary, so don't think we have to worry too much about this
>> overflowing ;)
>
> I tend to think more in terms in ns than us. Is this metric in us
> currently?

Yeah, WiFi stuff generally thinks in coarser time scales than you, then;
everything tends to be microseconds here (the actual time unit in the
standard is 1.024 us IIRC).

> I figure having stuff that at least works correctly within the solar
> system is a good start, and getting coverage to 250 light years
> is sufficiently forward looking: http://www.atlasoftheuniverse.com/250lys.html

Heh, yeah, not sure the WiFi MAC is appropriate for those distances ;)

-Toke
Yibo Zhao April 9, 2019, 1:25 p.m. UTC | #15
On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>> This switches the airtime scheduler in mac80211 to use a virtual
>>> time-based
>>> scheduler instead of the round-robin scheduler used before. This has 
>>> a
>>> couple of advantages:
>>> 
>>> - No need to sync up the round-robin scheduler in firmware/hardware
>>> with
>>>   the round-robin airtime scheduler.
>>> 
>>> - If several stations are eligible for transmission we can schedule
>>> both of
>>>   them; no need to hard-block the scheduling rotation until the head 
>>> of
>>> the
>>>   queue has used up its quantum.
>>> 
>>> - The check of whether a station is eligible for transmission becomes
>>>   simpler (in ieee80211_txq_may_transmit()).
>>> 
>>> The drawback is that scheduling becomes slightly more expensive, as 
>>> we
>>> need
>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>> ieee80211_register_airtime() becomes O(logN) in the number of 
>>> currently
>>> scheduled TXQs. However, hopefully this number rarely grows too big
>>> (it's
>>> only TXQs currently backlogged, not all associated stations), so it
>>> shouldn't be too big of an issue.
>>> 
>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>>> ieee80211_sta *pubsta, u8 tid,
>>>  {
>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>>  	struct ieee80211_local *local = sta->sdata->local;
>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>> -	u32 airtime = 0;
>>> +	u64 airtime = 0, weight_sum;
>>> +
>>> +	if (!txq)
>>> +		return;
>>> 
>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>  		airtime += tx_airtime;
>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>  		airtime += rx_airtime;
>>> 
>>> +	/* Weights scale so the unit weight is 256 */
>>> +	airtime <<= 8;
>>> +
>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>> +
>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>> -	sta->airtime[ac].deficit -= airtime;
>>> +
>>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>>> +
>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>> Hi Toke,
>> 
>> Please ignore the previous two broken emails regarding this new 
>> proposal
>> from me.
>> 
>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>> stations with less airtime than that are valid for Tx. That means 
>> there
>> are situations, like 50 clients, that some of the stations can be used
>> to Tx when putting next_txq in the loop. Am I right?
> 
> I'm not sure what you mean here. Are you referring to the case where 
> new
> stations appear with a very low (zero) airtime_v_t? That is handled 
> when
> the station is enqueued.
Hi Toke,

Sorry for the confusion. I am not referring to the case that you 
mentioned though it can be solved by your subtle design, max(local vt, 
sta vt). :-)

Actually, my concern is situation about putting next_txq in the loop. 
Let me explain a little more and see below.

> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
> ac)
>  {
>  	struct ieee80211_local *local = hw_to_local(hw);
> +	struct rb_node *node = local->schedule_pos[ac];
>  	struct txq_info *txqi = NULL;
> +	bool first = false;
> 
>  	lockdep_assert_held(&local->active_txq_lock[ac]);
> 
> - begin:
> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
> -					struct txq_info,
> -					schedule_order);
> -	if (!txqi)
> +	if (!node) {
> +		node = rb_first_cached(&local->active_txqs[ac]);
> +		first = true;
> +	} else
> +		node = rb_next(node);

Consider below piece of code from ath10k_mac_schedule_txq:

         ieee80211_txq_schedule_start(hw, ac);
         while ((txq = ieee80211_next_txq(hw, ac))) {
                 while (ath10k_mac_tx_can_push(hw, txq)) {
                         ret = ath10k_mac_tx_push_txq(hw, txq);
                         if (ret < 0)
                                 break;
                 }
                 ieee80211_return_txq(hw, txq);
                 ath10k_htt_tx_txq_update(hw, txq);
                 if (ret == -EBUSY)
                         break;
         }
         ieee80211_txq_schedule_end(hw, ac);

If my understanding is right, local->schedule_pos is used to record the 
last scheduled node and used for traversal rbtree for valid txq. There 
is chance that an empty txq is feeded to return_txq and got removed from 
rbtree. The empty txq will always be the rb_first node. Then in the 
following next_txq, local->schedule_pos becomes meaningless since its 
rb_next will return NULL and the loop break. Only rb_first get dequeued 
during this loop.

	if (!node || RB_EMPTY_NODE(node)) {
		node = rb_first_cached(&local->active_txqs[ac]);
		first = true;
	} else
		node = rb_next(node);

How about this? The nodes on the rbtree will be dequeued and removed 
from rbtree one by one until HW is busy. Please note local vt and sta vt 
will not be updated since txq lock is held during this time.


> 
>>> +	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
>> Another question. Any plan for taking v_t overflow situation into
>> consideration? u64 might be enough for low throughput products but not
>> sure for high end products. Something like below for reference:
> 
> The unit for the variable is time, not bytes, so it is unaffected by
> throughput. 2**64 microseconds is 584554 *years* according to my
> 'units' binary, so don't think we have to worry too much about this
> overflowing ;)

Cool, Great! Then no problem with that now.

> 
> -Toke
Toke Høiland-Jørgensen April 9, 2019, 8:41 p.m. UTC | #16
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>> time-based
>>>> scheduler instead of the round-robin scheduler used before. This has 
>>>> a
>>>> couple of advantages:
>>>> 
>>>> - No need to sync up the round-robin scheduler in firmware/hardware
>>>> with
>>>>   the round-robin airtime scheduler.
>>>> 
>>>> - If several stations are eligible for transmission we can schedule
>>>> both of
>>>>   them; no need to hard-block the scheduling rotation until the head 
>>>> of
>>>> the
>>>>   queue has used up its quantum.
>>>> 
>>>> - The check of whether a station is eligible for transmission becomes
>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>> 
>>>> The drawback is that scheduling becomes slightly more expensive, as 
>>>> we
>>>> need
>>>> to maintain an rbtree of TXQs sorted by virtual time. This means that
>>>> ieee80211_register_airtime() becomes O(logN) in the number of 
>>>> currently
>>>> scheduled TXQs. However, hopefully this number rarely grows too big
>>>> (it's
>>>> only TXQs currently backlogged, not all associated stations), so it
>>>> shouldn't be too big of an issue.
>>>> 
>>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>>>> ieee80211_sta *pubsta, u8 tid,
>>>>  {
>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>> -	u32 airtime = 0;
>>>> +	u64 airtime = 0, weight_sum;
>>>> +
>>>> +	if (!txq)
>>>> +		return;
>>>> 
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>  		airtime += tx_airtime;
>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>  		airtime += rx_airtime;
>>>> 
>>>> +	/* Weights scale so the unit weight is 256 */
>>>> +	airtime <<= 8;
>>>> +
>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>> +
>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>> -	sta->airtime[ac].deficit -= airtime;
>>>> +
>>>> +	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
>>>> +
>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>> Hi Toke,
>>> 
>>> Please ignore the previous two broken emails regarding this new 
>>> proposal
>>> from me.
>>> 
>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>> stations with less airtime than that are valid for Tx. That means 
>>> there
>>> are situations, like 50 clients, that some of the stations can be used
>>> to Tx when putting next_txq in the loop. Am I right?
>> 
>> I'm not sure what you mean here. Are you referring to the case where 
>> new
>> stations appear with a very low (zero) airtime_v_t? That is handled 
>> when
>> the station is enqueued.
> Hi Toke,
>
> Sorry for the confusion. I am not referring to the case that you 
> mentioned though it can be solved by your subtle design, max(local vt, 
> sta vt). :-)
>
> Actually, my concern is situation about putting next_txq in the loop. 
> Let me explain a little more and see below.
>
>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
>> ac)
>>  {
>>  	struct ieee80211_local *local = hw_to_local(hw);
>> +	struct rb_node *node = local->schedule_pos[ac];
>>  	struct txq_info *txqi = NULL;
>> +	bool first = false;
>> 
>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>> 
>> - begin:
>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>> -					struct txq_info,
>> -					schedule_order);
>> -	if (!txqi)
>> +	if (!node) {
>> +		node = rb_first_cached(&local->active_txqs[ac]);
>> +		first = true;
>> +	} else
>> +		node = rb_next(node);
>
> Consider below piece of code from ath10k_mac_schedule_txq:
>
>          ieee80211_txq_schedule_start(hw, ac);
>          while ((txq = ieee80211_next_txq(hw, ac))) {
>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>                          if (ret < 0)
>                                  break;
>                  }
>                  ieee80211_return_txq(hw, txq);
>                  ath10k_htt_tx_txq_update(hw, txq);
>                  if (ret == -EBUSY)
>                          break;
>          }
>          ieee80211_txq_schedule_end(hw, ac);
>
> If my understanding is right, local->schedule_pos is used to record the 
> last scheduled node and used for traversal rbtree for valid txq. There 
> is chance that an empty txq is feeded to return_txq and got removed from 
> rbtree. The empty txq will always be the rb_first node. Then in the 
> following next_txq, local->schedule_pos becomes meaningless since its 
> rb_next will return NULL and the loop break. Only rb_first get dequeued 
> during this loop.
>
> 	if (!node || RB_EMPTY_NODE(node)) {
> 		node = rb_first_cached(&local->active_txqs[ac]);
> 		first = true;
> 	} else
> 		node = rb_next(node);

Ah, I see what you mean. Yes, that would indeed be a problem - nice
catch! :)

> How about this? The nodes on the rbtree will be dequeued and removed
> from rbtree one by one until HW is busy. Please note local vt and sta
> vt will not be updated since txq lock is held during this time.

Insertion and removal from the rbtree are relatively expensive, so I'd
rather not do that for every txq. I think a better way to solve this
is to just defer the actual removal from the tree until
ieee80211_txq_schedule_end()... Will fix that when I submit this again.

-Toke
Yibo Zhao April 10, 2019, 6:35 a.m. UTC | #17
On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>> 
>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>>> time-based
>>>>> scheduler instead of the round-robin scheduler used before. This 
>>>>> has
>>>>> a
>>>>> couple of advantages:
>>>>> 
>>>>> - No need to sync up the round-robin scheduler in firmware/hardware
>>>>> with
>>>>>   the round-robin airtime scheduler.
>>>>> 
>>>>> - If several stations are eligible for transmission we can schedule
>>>>> both of
>>>>>   them; no need to hard-block the scheduling rotation until the 
>>>>> head
>>>>> of
>>>>> the
>>>>>   queue has used up its quantum.
>>>>> 
>>>>> - The check of whether a station is eligible for transmission 
>>>>> becomes
>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>> 
>>>>> The drawback is that scheduling becomes slightly more expensive, as
>>>>> we
>>>>> need
>>>>> to maintain an rbtree of TXQs sorted by virtual time. This means 
>>>>> that
>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>> currently
>>>>> scheduled TXQs. However, hopefully this number rarely grows too big
>>>>> (it's
>>>>> only TXQs currently backlogged, not all associated stations), so it
>>>>> shouldn't be too big of an issue.
>>>>> 
>>>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>  {
>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, 
>>>>> sta);
>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>> -	u32 airtime = 0;
>>>>> +	u64 airtime = 0, weight_sum;
>>>>> +
>>>>> +	if (!txq)
>>>>> +		return;
>>>>> 
>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>  		airtime += tx_airtime;
>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>  		airtime += rx_airtime;
>>>>> 
>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>> +	airtime <<= 8;
>>>>> +
>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>> +
>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>> +
>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?: 
>>>>> sta->airtime_weight;
>>>>> +
>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>> Hi Toke,
>>>> 
>>>> Please ignore the previous two broken emails regarding this new
>>>> proposal
>>>> from me.
>>>> 
>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>>> stations with less airtime than that are valid for Tx. That means
>>>> there
>>>> are situations, like 50 clients, that some of the stations can be 
>>>> used
>>>> to Tx when putting next_txq in the loop. Am I right?
>>> 
>>> I'm not sure what you mean here. Are you referring to the case where
>>> new
>>> stations appear with a very low (zero) airtime_v_t? That is handled
>>> when
>>> the station is enqueued.
>> Hi Toke,
>> 
>> Sorry for the confusion. I am not referring to the case that you
>> mentioned though it can be solved by your subtle design, max(local vt,
>> sta vt). :-)
>> 
>> Actually, my concern is situation about putting next_txq in the loop.
>> Let me explain a little more and see below.
>> 
>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
>>> ac)
>>>  {
>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>  	struct txq_info *txqi = NULL;
>>> +	bool first = false;
>>> 
>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>> 
>>> - begin:
>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>> -					struct txq_info,
>>> -					schedule_order);
>>> -	if (!txqi)
>>> +	if (!node) {
>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>> +		first = true;
>>> +	} else
>>> +		node = rb_next(node);
>> 
>> Consider below piece of code from ath10k_mac_schedule_txq:
>> 
>>          ieee80211_txq_schedule_start(hw, ac);
>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>                          if (ret < 0)
>>                                  break;
>>                  }
>>                  ieee80211_return_txq(hw, txq);
>>                  ath10k_htt_tx_txq_update(hw, txq);
>>                  if (ret == -EBUSY)
>>                          break;
>>          }
>>          ieee80211_txq_schedule_end(hw, ac);
>> 
>> If my understanding is right, local->schedule_pos is used to record 
>> the
>> last scheduled node and used for traversal rbtree for valid txq. There
>> is chance that an empty txq is feeded to return_txq and got removed 
>> from
>> rbtree. The empty txq will always be the rb_first node. Then in the
>> following next_txq, local->schedule_pos becomes meaningless since its
>> rb_next will return NULL and the loop break. Only rb_first get 
>> dequeued
>> during this loop.
>> 
>> 	if (!node || RB_EMPTY_NODE(node)) {
>> 		node = rb_first_cached(&local->active_txqs[ac]);
>> 		first = true;
>> 	} else
>> 		node = rb_next(node);
> 
> Ah, I see what you mean. Yes, that would indeed be a problem - nice
> catch! :)
> 
>> How about this? The nodes on the rbtree will be dequeued and removed
>> from rbtree one by one until HW is busy. Please note local vt and sta
>> vt will not be updated since txq lock is held during this time.
> 
> Insertion and removal from the rbtree are relatively expensive, so I'd
> rather not do that for every txq. I think a better way to solve this
> is to just defer the actual removal from the tree until
> ieee80211_txq_schedule_end()... Will fix that when I submit this again.

Do you mean we keep the empty txqs in the rbtree until loop finishes and 
remove them in ieee80211_txq_schedule_end(may be put return_txq in it)? 
If it is the case, I suppose a list is needed to store the empty txqs so 
as to dequeue them in ieee80211_txq_schedule_end.

And one more thing,

> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
> +                       if (first)
> +                               local->airtime_v_t[ac] = 
> sta->airtime[ac].v_t;
> +                       else
> +                               return NULL;

As local->airtime_v_t will not be updated during loop, we don't need to 
return NULL.

> 
> -Toke
Toke Høiland-Jørgensen April 10, 2019, 10:40 a.m. UTC | #18
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>> 
>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>>>> time-based
>>>>>> scheduler instead of the round-robin scheduler used before. This 
>>>>>> has
>>>>>> a
>>>>>> couple of advantages:
>>>>>> 
>>>>>> - No need to sync up the round-robin scheduler in firmware/hardware
>>>>>> with
>>>>>>   the round-robin airtime scheduler.
>>>>>> 
>>>>>> - If several stations are eligible for transmission we can schedule
>>>>>> both of
>>>>>>   them; no need to hard-block the scheduling rotation until the 
>>>>>> head
>>>>>> of
>>>>>> the
>>>>>>   queue has used up its quantum.
>>>>>> 
>>>>>> - The check of whether a station is eligible for transmission 
>>>>>> becomes
>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>> 
>>>>>> The drawback is that scheduling becomes slightly more expensive, as
>>>>>> we
>>>>>> need
>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This means 
>>>>>> that
>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>> currently
>>>>>> scheduled TXQs. However, hopefully this number rarely grows too big
>>>>>> (it's
>>>>>> only TXQs currently backlogged, not all associated stations), so it
>>>>>> shouldn't be too big of an issue.
>>>>>> 
>>>>>> @@ -1831,18 +1830,32 @@ void ieee80211_sta_register_airtime(struct
>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>  {
>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info, 
>>>>>> sta);
>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>> -	u32 airtime = 0;
>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>> +
>>>>>> +	if (!txq)
>>>>>> +		return;
>>>>>> 
>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>  		airtime += tx_airtime;
>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>  		airtime += rx_airtime;
>>>>>> 
>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>> +	airtime <<= 8;
>>>>>> +
>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>> +
>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>> +
>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?: 
>>>>>> sta->airtime_weight;
>>>>>> +
>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>>> Hi Toke,
>>>>> 
>>>>> Please ignore the previous two broken emails regarding this new
>>>>> proposal
>>>>> from me.
>>>>> 
>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>>>> stations with less airtime than that are valid for Tx. That means
>>>>> there
>>>>> are situations, like 50 clients, that some of the stations can be 
>>>>> used
>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>> 
>>>> I'm not sure what you mean here. Are you referring to the case where
>>>> new
>>>> stations appear with a very low (zero) airtime_v_t? That is handled
>>>> when
>>>> the station is enqueued.
>>> Hi Toke,
>>> 
>>> Sorry for the confusion. I am not referring to the case that you
>>> mentioned though it can be solved by your subtle design, max(local vt,
>>> sta vt). :-)
>>> 
>>> Actually, my concern is situation about putting next_txq in the loop.
>>> Let me explain a little more and see below.
>>> 
>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8
>>>> ac)
>>>>  {
>>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>>  	struct txq_info *txqi = NULL;
>>>> +	bool first = false;
>>>> 
>>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>>> 
>>>> - begin:
>>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>> -					struct txq_info,
>>>> -					schedule_order);
>>>> -	if (!txqi)
>>>> +	if (!node) {
>>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>>> +		first = true;
>>>> +	} else
>>>> +		node = rb_next(node);
>>> 
>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>> 
>>>          ieee80211_txq_schedule_start(hw, ac);
>>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>>                          if (ret < 0)
>>>                                  break;
>>>                  }
>>>                  ieee80211_return_txq(hw, txq);
>>>                  ath10k_htt_tx_txq_update(hw, txq);
>>>                  if (ret == -EBUSY)
>>>                          break;
>>>          }
>>>          ieee80211_txq_schedule_end(hw, ac);
>>> 
>>> If my understanding is right, local->schedule_pos is used to record 
>>> the
>>> last scheduled node and used for traversal rbtree for valid txq. There
>>> is chance that an empty txq is feeded to return_txq and got removed 
>>> from
>>> rbtree. The empty txq will always be the rb_first node. Then in the
>>> following next_txq, local->schedule_pos becomes meaningless since its
>>> rb_next will return NULL and the loop break. Only rb_first get 
>>> dequeued
>>> during this loop.
>>> 
>>> 	if (!node || RB_EMPTY_NODE(node)) {
>>> 		node = rb_first_cached(&local->active_txqs[ac]);
>>> 		first = true;
>>> 	} else
>>> 		node = rb_next(node);
>> 
>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>> catch! :)
>> 
>>> How about this? The nodes on the rbtree will be dequeued and removed
>>> from rbtree one by one until HW is busy. Please note local vt and sta
>>> vt will not be updated since txq lock is held during this time.
>> 
>> Insertion and removal from the rbtree are relatively expensive, so I'd
>> rather not do that for every txq. I think a better way to solve this
>> is to just defer the actual removal from the tree until
>> ieee80211_txq_schedule_end()... Will fix that when I submit this again.
>
> Do you mean we keep the empty txqs in the rbtree until loop finishes and 
> remove them in ieee80211_txq_schedule_end(may be put return_txq in it)? 
> If it is the case, I suppose a list is needed to store the empty txqs so 
> as to dequeue them in ieee80211_txq_schedule_end.

Yeah, return_txq() would just put "to be removed" TXQs on a list, and
schedule_end() would do the actual removal (after checking whether a new
packet showed up in the meantime).

> And one more thing,
>
>> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
>> +                       if (first)
>> +                               local->airtime_v_t[ac] = 
>> sta->airtime[ac].v_t;
>> +                       else
>> +                               return NULL;
>
> As local->airtime_v_t will not be updated during loop, we don't need to 
> return NULL.

Yes we do; this is actually the break condition. I.e., stations whose
virtual time are higher than the global time (in local->airtime_v_t) are
not allowed to transmit. And since we are traversing them in order, when
we find the first such station, we are done and can break out of the
scheduling loop entirely (which is what we do by returning NULL). The
other branch in the inner if() is just for the case where no stations
are currently eligible to transmit according to this rule; here we don't
want to stall, so we advance the global timer so the first station
becomes eligible...

-Toke
Yibo Zhao April 11, 2019, 3:12 a.m. UTC | #19
On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>> 
>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>> 
>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>>>>> time-based
>>>>>>> scheduler instead of the round-robin scheduler used before. This
>>>>>>> has
>>>>>>> a
>>>>>>> couple of advantages:
>>>>>>> 
>>>>>>> - No need to sync up the round-robin scheduler in 
>>>>>>> firmware/hardware
>>>>>>> with
>>>>>>>   the round-robin airtime scheduler.
>>>>>>> 
>>>>>>> - If several stations are eligible for transmission we can 
>>>>>>> schedule
>>>>>>> both of
>>>>>>>   them; no need to hard-block the scheduling rotation until the
>>>>>>> head
>>>>>>> of
>>>>>>> the
>>>>>>>   queue has used up its quantum.
>>>>>>> 
>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>> becomes
>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>> 
>>>>>>> The drawback is that scheduling becomes slightly more expensive, 
>>>>>>> as
>>>>>>> we
>>>>>>> need
>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This means
>>>>>>> that
>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>> currently
>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too 
>>>>>>> big
>>>>>>> (it's
>>>>>>> only TXQs currently backlogged, not all associated stations), so 
>>>>>>> it
>>>>>>> shouldn't be too big of an issue.
>>>>>>> 
>>>>>>> @@ -1831,18 +1830,32 @@ void 
>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>  {
>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>> sta);
>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>> -	u32 airtime = 0;
>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>> +
>>>>>>> +	if (!txq)
>>>>>>> +		return;
>>>>>>> 
>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>  		airtime += tx_airtime;
>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>  		airtime += rx_airtime;
>>>>>>> 
>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>> +	airtime <<= 8;
>>>>>>> +
>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>> +
>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>> +
>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>> sta->airtime_weight;
>>>>>>> +
>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>>>> Hi Toke,
>>>>>> 
>>>>>> Please ignore the previous two broken emails regarding this new
>>>>>> proposal
>>>>>> from me.
>>>>>> 
>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>>>>> stations with less airtime than that are valid for Tx. That means
>>>>>> there
>>>>>> are situations, like 50 clients, that some of the stations can be
>>>>>> used
>>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>>> 
>>>>> I'm not sure what you mean here. Are you referring to the case 
>>>>> where
>>>>> new
>>>>> stations appear with a very low (zero) airtime_v_t? That is handled
>>>>> when
>>>>> the station is enqueued.
>>>> Hi Toke,
>>>> 
>>>> Sorry for the confusion. I am not referring to the case that you
>>>> mentioned though it can be solved by your subtle design, max(local 
>>>> vt,
>>>> sta vt). :-)
>>>> 
>>>> Actually, my concern is situation about putting next_txq in the 
>>>> loop.
>>>> Let me explain a little more and see below.
>>>> 
>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, 
>>>>> u8
>>>>> ac)
>>>>>  {
>>>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>>>  	struct txq_info *txqi = NULL;
>>>>> +	bool first = false;
>>>>> 
>>>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>> 
>>>>> - begin:
>>>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>>> -					struct txq_info,
>>>>> -					schedule_order);
>>>>> -	if (!txqi)
>>>>> +	if (!node) {
>>>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>>>> +		first = true;
>>>>> +	} else
>>>>> +		node = rb_next(node);
>>>> 
>>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>>> 
>>>>          ieee80211_txq_schedule_start(hw, ac);
>>>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>>>                          if (ret < 0)
>>>>                                  break;
>>>>                  }
>>>>                  ieee80211_return_txq(hw, txq);
>>>>                  ath10k_htt_tx_txq_update(hw, txq);
>>>>                  if (ret == -EBUSY)
>>>>                          break;
>>>>          }
>>>>          ieee80211_txq_schedule_end(hw, ac);
>>>> 
>>>> If my understanding is right, local->schedule_pos is used to record
>>>> the
>>>> last scheduled node and used for traversal rbtree for valid txq. 
>>>> There
>>>> is chance that an empty txq is feeded to return_txq and got removed
>>>> from
>>>> rbtree. The empty txq will always be the rb_first node. Then in the
>>>> following next_txq, local->schedule_pos becomes meaningless since 
>>>> its
>>>> rb_next will return NULL and the loop break. Only rb_first get
>>>> dequeued
>>>> during this loop.
>>>> 
>>>> 	if (!node || RB_EMPTY_NODE(node)) {
>>>> 		node = rb_first_cached(&local->active_txqs[ac]);
>>>> 		first = true;
>>>> 	} else
>>>> 		node = rb_next(node);
>>> 
>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>>> catch! :)
>>> 
>>>> How about this? The nodes on the rbtree will be dequeued and removed
>>>> from rbtree one by one until HW is busy. Please note local vt and 
>>>> sta
>>>> vt will not be updated since txq lock is held during this time.
>>> 
>>> Insertion and removal from the rbtree are relatively expensive, so 
>>> I'd
>>> rather not do that for every txq. I think a better way to solve this
>>> is to just defer the actual removal from the tree until
>>> ieee80211_txq_schedule_end()... Will fix that when I submit this 
>>> again.
>> 
>> Do you mean we keep the empty txqs in the rbtree until loop finishes 
>> and
>> remove them in ieee80211_txq_schedule_end(may be put return_txq in 
>> it)?
>> If it is the case, I suppose a list is needed to store the empty txqs 
>> so
>> as to dequeue them in ieee80211_txq_schedule_end.
> 
> Yeah, return_txq() would just put "to be removed" TXQs on a list, and
> schedule_end() would do the actual removal (after checking whether a 
> new
> packet showed up in the meantime).

SGTM

> 
>> And one more thing,
>> 
>>> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
>>> +                       if (first)
>>> +                               local->airtime_v_t[ac] =
>>> sta->airtime[ac].v_t;
>>> +                       else
>>> +                               return NULL;
>> 
>> As local->airtime_v_t will not be updated during loop, we don't need 
>> to
>> return NULL.
> 
> Yes we do; this is actually the break condition. I.e., stations whose
> virtual time are higher than the global time (in local->airtime_v_t) 
> are
> not allowed to transmit. And since we are traversing them in order, 
> when
> we find the first such station, we are done and can break out of the
> scheduling loop entirely (which is what we do by returning NULL). The
> other branch in the inner if() is just for the case where no stations
> are currently eligible to transmit according to this rule; here we 
> don't
> want to stall, so we advance the global timer so the first station
> becomes eligible...

Yes,the inner if() make sure first node always get scheduled no matter 
its vt.

To detail my concern, let's assume only two nodes in the tree and empty 
nodes will be in tree until schedule_end(). In the loop and in case hw 
is not busy, ath10k will drain every node next_txq returned before 
asking for another txq again. Then as we are traversing to next rb node, 
it is highly possible the second node is not allowed to transmit since 
the global time has not been updated yet as the active txq lock is held. 
At this time, only second node on the tree has data and hw is capable of 
sending more data. I don't think the second node is not valid for 
transmission in this situation.

With more nodes in the tree in this situation, I think same thing 
happens that all nodes except the first node are not allowed to transmit 
since none of their vts are less than the global time which is not 
updated in time. The loop breaks when we are checking the second node.

> 
> -Toke
Toke Høiland-Jørgensen April 11, 2019, 11:24 a.m. UTC | #20
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>> 
>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>> 
>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>> This switches the airtime scheduler in mac80211 to use a virtual
>>>>>>>> time-based
>>>>>>>> scheduler instead of the round-robin scheduler used before. This
>>>>>>>> has
>>>>>>>> a
>>>>>>>> couple of advantages:
>>>>>>>> 
>>>>>>>> - No need to sync up the round-robin scheduler in 
>>>>>>>> firmware/hardware
>>>>>>>> with
>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>> 
>>>>>>>> - If several stations are eligible for transmission we can 
>>>>>>>> schedule
>>>>>>>> both of
>>>>>>>>   them; no need to hard-block the scheduling rotation until the
>>>>>>>> head
>>>>>>>> of
>>>>>>>> the
>>>>>>>>   queue has used up its quantum.
>>>>>>>> 
>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>> becomes
>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>> 
>>>>>>>> The drawback is that scheduling becomes slightly more expensive, 
>>>>>>>> as
>>>>>>>> we
>>>>>>>> need
>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This means
>>>>>>>> that
>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>> currently
>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too 
>>>>>>>> big
>>>>>>>> (it's
>>>>>>>> only TXQs currently backlogged, not all associated stations), so 
>>>>>>>> it
>>>>>>>> shouldn't be too big of an issue.
>>>>>>>> 
>>>>>>>> @@ -1831,18 +1830,32 @@ void 
>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>  {
>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>>> sta);
>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>> -	u32 airtime = 0;
>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>> +
>>>>>>>> +	if (!txq)
>>>>>>>> +		return;
>>>>>>>> 
>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>  		airtime += rx_airtime;
>>>>>>>> 
>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>> +	airtime <<= 8;
>>>>>>>> +
>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>> +
>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>> +
>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>> sta->airtime_weight;
>>>>>>>> +
>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>>>>> Hi Toke,
>>>>>>> 
>>>>>>> Please ignore the previous two broken emails regarding this new
>>>>>>> proposal
>>>>>>> from me.
>>>>>>> 
>>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only the
>>>>>>> stations with less airtime than that are valid for Tx. That means
>>>>>>> there
>>>>>>> are situations, like 50 clients, that some of the stations can be
>>>>>>> used
>>>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>>>> 
>>>>>> I'm not sure what you mean here. Are you referring to the case 
>>>>>> where
>>>>>> new
>>>>>> stations appear with a very low (zero) airtime_v_t? That is handled
>>>>>> when
>>>>>> the station is enqueued.
>>>>> Hi Toke,
>>>>> 
>>>>> Sorry for the confusion. I am not referring to the case that you
>>>>> mentioned though it can be solved by your subtle design, max(local 
>>>>> vt,
>>>>> sta vt). :-)
>>>>> 
>>>>> Actually, my concern is situation about putting next_txq in the 
>>>>> loop.
>>>>> Let me explain a little more and see below.
>>>>> 
>>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, 
>>>>>> u8
>>>>>> ac)
>>>>>>  {
>>>>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>>>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>>>>  	struct txq_info *txqi = NULL;
>>>>>> +	bool first = false;
>>>>>> 
>>>>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>> 
>>>>>> - begin:
>>>>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>>>> -					struct txq_info,
>>>>>> -					schedule_order);
>>>>>> -	if (!txqi)
>>>>>> +	if (!node) {
>>>>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>>>>> +		first = true;
>>>>>> +	} else
>>>>>> +		node = rb_next(node);
>>>>> 
>>>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>>>> 
>>>>>          ieee80211_txq_schedule_start(hw, ac);
>>>>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>>>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>>>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>>>>                          if (ret < 0)
>>>>>                                  break;
>>>>>                  }
>>>>>                  ieee80211_return_txq(hw, txq);
>>>>>                  ath10k_htt_tx_txq_update(hw, txq);
>>>>>                  if (ret == -EBUSY)
>>>>>                          break;
>>>>>          }
>>>>>          ieee80211_txq_schedule_end(hw, ac);
>>>>> 
>>>>> If my understanding is right, local->schedule_pos is used to record
>>>>> the
>>>>> last scheduled node and used for traversal rbtree for valid txq. 
>>>>> There
>>>>> is chance that an empty txq is feeded to return_txq and got removed
>>>>> from
>>>>> rbtree. The empty txq will always be the rb_first node. Then in the
>>>>> following next_txq, local->schedule_pos becomes meaningless since 
>>>>> its
>>>>> rb_next will return NULL and the loop break. Only rb_first get
>>>>> dequeued
>>>>> during this loop.
>>>>> 
>>>>> 	if (!node || RB_EMPTY_NODE(node)) {
>>>>> 		node = rb_first_cached(&local->active_txqs[ac]);
>>>>> 		first = true;
>>>>> 	} else
>>>>> 		node = rb_next(node);
>>>> 
>>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>>>> catch! :)
>>>> 
>>>>> How about this? The nodes on the rbtree will be dequeued and removed
>>>>> from rbtree one by one until HW is busy. Please note local vt and 
>>>>> sta
>>>>> vt will not be updated since txq lock is held during this time.
>>>> 
>>>> Insertion and removal from the rbtree are relatively expensive, so 
>>>> I'd
>>>> rather not do that for every txq. I think a better way to solve this
>>>> is to just defer the actual removal from the tree until
>>>> ieee80211_txq_schedule_end()... Will fix that when I submit this 
>>>> again.
>>> 
>>> Do you mean we keep the empty txqs in the rbtree until loop finishes 
>>> and
>>> remove them in ieee80211_txq_schedule_end(may be put return_txq in 
>>> it)?
>>> If it is the case, I suppose a list is needed to store the empty txqs 
>>> so
>>> as to dequeue them in ieee80211_txq_schedule_end.
>> 
>> Yeah, return_txq() would just put "to be removed" TXQs on a list, and
>> schedule_end() would do the actual removal (after checking whether a 
>> new
>> packet showed up in the meantime).
>
> SGTM
>
>> 
>>> And one more thing,
>>> 
>>>> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
>>>> +                       if (first)
>>>> +                               local->airtime_v_t[ac] =
>>>> sta->airtime[ac].v_t;
>>>> +                       else
>>>> +                               return NULL;
>>> 
>>> As local->airtime_v_t will not be updated during loop, we don't need 
>>> to
>>> return NULL.
>> 
>> Yes we do; this is actually the break condition. I.e., stations whose
>> virtual time are higher than the global time (in local->airtime_v_t) 
>> are
>> not allowed to transmit. And since we are traversing them in order, 
>> when
>> we find the first such station, we are done and can break out of the
>> scheduling loop entirely (which is what we do by returning NULL). The
>> other branch in the inner if() is just for the case where no stations
>> are currently eligible to transmit according to this rule; here we 
>> don't
>> want to stall, so we advance the global timer so the first station
>> becomes eligible...
>
> Yes,the inner if() make sure first node always get scheduled no matter 
> its vt.
>
> To detail my concern, let's assume only two nodes in the tree and
> empty nodes will be in tree until schedule_end(). In the loop and in
> case hw is not busy, ath10k will drain every node next_txq returned
> before asking for another txq again. Then as we are traversing to next
> rb node, it is highly possible the second node is not allowed to
> transmit since the global time has not been updated yet as the active
> txq lock is held. At this time, only second node on the tree has data
> and hw is capable of sending more data. I don't think the second node
> is not valid for transmission in this situation.
>
> With more nodes in the tree in this situation, I think same thing
> happens that all nodes except the first node are not allowed to
> transmit since none of their vts are less than the global time which
> is not updated in time. The loop breaks when we are checking the
> second node.

Yeah, in many cases we will end up throttling all but the first (couple
of) node(s). This is by design; otherwise we can't ensure fairness. As
long as we are making forward progress that is fine, though...

-Toke
Yibo Zhao April 12, 2019, 6:34 a.m. UTC | #21
On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>> 
>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>> 
>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>> 
>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>> This switches the airtime scheduler in mac80211 to use a 
>>>>>>>>> virtual
>>>>>>>>> time-based
>>>>>>>>> scheduler instead of the round-robin scheduler used before. 
>>>>>>>>> This
>>>>>>>>> has
>>>>>>>>> a
>>>>>>>>> couple of advantages:
>>>>>>>>> 
>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>> firmware/hardware
>>>>>>>>> with
>>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>>> 
>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>> schedule
>>>>>>>>> both of
>>>>>>>>>   them; no need to hard-block the scheduling rotation until the
>>>>>>>>> head
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>>   queue has used up its quantum.
>>>>>>>>> 
>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>> becomes
>>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>> 
>>>>>>>>> The drawback is that scheduling becomes slightly more 
>>>>>>>>> expensive,
>>>>>>>>> as
>>>>>>>>> we
>>>>>>>>> need
>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This 
>>>>>>>>> means
>>>>>>>>> that
>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>> currently
>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too
>>>>>>>>> big
>>>>>>>>> (it's
>>>>>>>>> only TXQs currently backlogged, not all associated stations), 
>>>>>>>>> so
>>>>>>>>> it
>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>> 
>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>>  {
>>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>>>> sta);
>>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>> -	u32 airtime = 0;
>>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>>> +
>>>>>>>>> +	if (!txq)
>>>>>>>>> +		return;
>>>>>>>>> 
>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>>  		airtime += rx_airtime;
>>>>>>>>> 
>>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>>> +	airtime <<= 8;
>>>>>>>>> +
>>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>> +
>>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>>> +
>>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>> sta->airtime_weight;
>>>>>>>>> +
>>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>>>>>>> Hi Toke,
>>>>>>>> 
>>>>>>>> Please ignore the previous two broken emails regarding this new
>>>>>>>> proposal
>>>>>>>> from me.
>>>>>>>> 
>>>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only 
>>>>>>>> the
>>>>>>>> stations with less airtime than that are valid for Tx. That 
>>>>>>>> means
>>>>>>>> there
>>>>>>>> are situations, like 50 clients, that some of the stations can 
>>>>>>>> be
>>>>>>>> used
>>>>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>>>>> 
>>>>>>> I'm not sure what you mean here. Are you referring to the case
>>>>>>> where
>>>>>>> new
>>>>>>> stations appear with a very low (zero) airtime_v_t? That is 
>>>>>>> handled
>>>>>>> when
>>>>>>> the station is enqueued.
>>>>>> Hi Toke,
>>>>>> 
>>>>>> Sorry for the confusion. I am not referring to the case that you
>>>>>> mentioned though it can be solved by your subtle design, max(local
>>>>>> vt,
>>>>>> sta vt). :-)
>>>>>> 
>>>>>> Actually, my concern is situation about putting next_txq in the
>>>>>> loop.
>>>>>> Let me explain a little more and see below.
>>>>>> 
>>>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw 
>>>>>>> *hw,
>>>>>>> u8
>>>>>>> ac)
>>>>>>>  {
>>>>>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>>>>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>>>>>  	struct txq_info *txqi = NULL;
>>>>>>> +	bool first = false;
>>>>>>> 
>>>>>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>>> 
>>>>>>> - begin:
>>>>>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>>>>> -					struct txq_info,
>>>>>>> -					schedule_order);
>>>>>>> -	if (!txqi)
>>>>>>> +	if (!node) {
>>>>>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>>>>>> +		first = true;
>>>>>>> +	} else
>>>>>>> +		node = rb_next(node);
>>>>>> 
>>>>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>>>>> 
>>>>>>          ieee80211_txq_schedule_start(hw, ac);
>>>>>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>>>>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>>>>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>>>>>                          if (ret < 0)
>>>>>>                                  break;
>>>>>>                  }
>>>>>>                  ieee80211_return_txq(hw, txq);
>>>>>>                  ath10k_htt_tx_txq_update(hw, txq);
>>>>>>                  if (ret == -EBUSY)
>>>>>>                          break;
>>>>>>          }
>>>>>>          ieee80211_txq_schedule_end(hw, ac);
>>>>>> 
>>>>>> If my understanding is right, local->schedule_pos is used to 
>>>>>> record
>>>>>> the
>>>>>> last scheduled node and used for traversal rbtree for valid txq.
>>>>>> There
>>>>>> is chance that an empty txq is feeded to return_txq and got 
>>>>>> removed
>>>>>> from
>>>>>> rbtree. The empty txq will always be the rb_first node. Then in 
>>>>>> the
>>>>>> following next_txq, local->schedule_pos becomes meaningless since
>>>>>> its
>>>>>> rb_next will return NULL and the loop break. Only rb_first get
>>>>>> dequeued
>>>>>> during this loop.
>>>>>> 
>>>>>> 	if (!node || RB_EMPTY_NODE(node)) {
>>>>>> 		node = rb_first_cached(&local->active_txqs[ac]);
>>>>>> 		first = true;
>>>>>> 	} else
>>>>>> 		node = rb_next(node);
>>>>> 
>>>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>>>>> catch! :)
>>>>> 
>>>>>> How about this? The nodes on the rbtree will be dequeued and 
>>>>>> removed
>>>>>> from rbtree one by one until HW is busy. Please note local vt and
>>>>>> sta
>>>>>> vt will not be updated since txq lock is held during this time.
>>>>> 
>>>>> Insertion and removal from the rbtree are relatively expensive, so
>>>>> I'd
>>>>> rather not do that for every txq. I think a better way to solve 
>>>>> this
>>>>> is to just defer the actual removal from the tree until
>>>>> ieee80211_txq_schedule_end()... Will fix that when I submit this
>>>>> again.
>>>> 
>>>> Do you mean we keep the empty txqs in the rbtree until loop finishes
>>>> and
>>>> remove them in ieee80211_txq_schedule_end(may be put return_txq in
>>>> it)?
>>>> If it is the case, I suppose a list is needed to store the empty 
>>>> txqs
>>>> so
>>>> as to dequeue them in ieee80211_txq_schedule_end.
>>> 
>>> Yeah, return_txq() would just put "to be removed" TXQs on a list, and
>>> schedule_end() would do the actual removal (after checking whether a
>>> new
>>> packet showed up in the meantime).
>> 
>> SGTM
>> 
>>> 
>>>> And one more thing,
>>>> 
>>>>> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) 
>>>>> {
>>>>> +                       if (first)
>>>>> +                               local->airtime_v_t[ac] =
>>>>> sta->airtime[ac].v_t;
>>>>> +                       else
>>>>> +                               return NULL;
>>>> 
>>>> As local->airtime_v_t will not be updated during loop, we don't need
>>>> to
>>>> return NULL.
>>> 
>>> Yes we do; this is actually the break condition. I.e., stations whose
>>> virtual time are higher than the global time (in local->airtime_v_t)
>>> are
>>> not allowed to transmit. And since we are traversing them in order,
>>> when
>>> we find the first such station, we are done and can break out of the
>>> scheduling loop entirely (which is what we do by returning NULL). The
>>> other branch in the inner if() is just for the case where no stations
>>> are currently eligible to transmit according to this rule; here we
>>> don't
>>> want to stall, so we advance the global timer so the first station
>>> becomes eligible...
>> 
>> Yes,the inner if() make sure first node always get scheduled no matter
>> its vt.
>> 
>> To detail my concern, let's assume only two nodes in the tree and
>> empty nodes will be in tree until schedule_end(). In the loop and in
>> case hw is not busy, ath10k will drain every node next_txq returned
>> before asking for another txq again. Then as we are traversing to next
>> rb node, it is highly possible the second node is not allowed to
>> transmit since the global time has not been updated yet as the active
>> txq lock is held. At this time, only second node on the tree has data
>> and hw is capable of sending more data. I don't think the second node
>> is not valid for transmission in this situation.
>> 
>> With more nodes in the tree in this situation, I think same thing
>> happens that all nodes except the first node are not allowed to
>> transmit since none of their vts are less than the global time which
>> is not updated in time. The loop breaks when we are checking the
>> second node.
> 
> Yeah, in many cases we will end up throttling all but the first (couple
> of) node(s). This is by design; otherwise we can't ensure fairness. As
> long as we are making forward progress that is fine, though...
All right, I see your point. Fairness is our first priority. :)
> 
> -Toke
Yibo Zhao April 19, 2019, 3:05 p.m. UTC | #22
On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>> 
>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>> 
>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>> 
>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>> This switches the airtime scheduler in mac80211 to use a 
>>>>>>>>> virtual
>>>>>>>>> time-based
>>>>>>>>> scheduler instead of the round-robin scheduler used before. 
>>>>>>>>> This
>>>>>>>>> has
>>>>>>>>> a
>>>>>>>>> couple of advantages:
>>>>>>>>> 
>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>> firmware/hardware
>>>>>>>>> with
>>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>>> 
>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>> schedule
>>>>>>>>> both of
>>>>>>>>>   them; no need to hard-block the scheduling rotation until the
>>>>>>>>> head
>>>>>>>>> of
>>>>>>>>> the
>>>>>>>>>   queue has used up its quantum.
>>>>>>>>> 
>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>> becomes
>>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>> 
>>>>>>>>> The drawback is that scheduling becomes slightly more 
>>>>>>>>> expensive,
>>>>>>>>> as
>>>>>>>>> we
>>>>>>>>> need
>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This 
>>>>>>>>> means
>>>>>>>>> that
>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>> currently
>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too
>>>>>>>>> big
>>>>>>>>> (it's
>>>>>>>>> only TXQs currently backlogged, not all associated stations), 
>>>>>>>>> so
>>>>>>>>> it
>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>> 
>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>>  {
>>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>>>> sta);
>>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>> -	u32 airtime = 0;
>>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>>> +
>>>>>>>>> +	if (!txq)
>>>>>>>>> +		return;
>>>>>>>>> 
>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>>  		airtime += rx_airtime;
>>>>>>>>> 
>>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>>> +	airtime <<= 8;
>>>>>>>>> +
>>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>> +
>>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>>> +
>>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>> sta->airtime_weight;
>>>>>>>>> +
>>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
Hi Toke,

I was porting this version of ATF design to my ath10k platform and found 
my old kernel version not supporting 64bit division. I'm wondering if it 
is necessary to use u64 for airtime and weight_sum here though I can 
find a solution for it. I think u32 might be enough. For airtime, 
u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 us 
= 130490, meaning we can support more than 130000 nodes with airtime 
weight 8092 us.

Another finding was when I configured two 11ac STAs with different 
airtime weight, such as 256 and 1024 meaning ratio is 1:4, the 
throughput ratio was not roughly matching the ratio. Could you please 
share your results? I am not sure if it is due to platform difference.

>>>>>>>> Hi Toke,
>>>>>>>> 
>>>>>>>> Please ignore the previous two broken emails regarding this new
>>>>>>>> proposal
>>>>>>>> from me.
>>>>>>>> 
>>>>>>>> It looks like local->airtime_v_t acts like a Tx criteria. Only 
>>>>>>>> the
>>>>>>>> stations with less airtime than that are valid for Tx. That 
>>>>>>>> means
>>>>>>>> there
>>>>>>>> are situations, like 50 clients, that some of the stations can 
>>>>>>>> be
>>>>>>>> used
>>>>>>>> to Tx when putting next_txq in the loop. Am I right?
>>>>>>> 
>>>>>>> I'm not sure what you mean here. Are you referring to the case
>>>>>>> where
>>>>>>> new
>>>>>>> stations appear with a very low (zero) airtime_v_t? That is 
>>>>>>> handled
>>>>>>> when
>>>>>>> the station is enqueued.
>>>>>> Hi Toke,
>>>>>> 
>>>>>> Sorry for the confusion. I am not referring to the case that you
>>>>>> mentioned though it can be solved by your subtle design, max(local
>>>>>> vt,
>>>>>> sta vt). :-)
>>>>>> 
>>>>>> Actually, my concern is situation about putting next_txq in the
>>>>>> loop.
>>>>>> Let me explain a little more and see below.
>>>>>> 
>>>>>>> @@ -3640,126 +3638,191 @@ EXPORT_SYMBOL(ieee80211_tx_dequeue);
>>>>>>>  struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw 
>>>>>>> *hw,
>>>>>>> u8
>>>>>>> ac)
>>>>>>>  {
>>>>>>>  	struct ieee80211_local *local = hw_to_local(hw);
>>>>>>> +	struct rb_node *node = local->schedule_pos[ac];
>>>>>>>  	struct txq_info *txqi = NULL;
>>>>>>> +	bool first = false;
>>>>>>> 
>>>>>>>  	lockdep_assert_held(&local->active_txq_lock[ac]);
>>>>>>> 
>>>>>>> - begin:
>>>>>>> -	txqi = list_first_entry_or_null(&local->active_txqs[ac],
>>>>>>> -					struct txq_info,
>>>>>>> -					schedule_order);
>>>>>>> -	if (!txqi)
>>>>>>> +	if (!node) {
>>>>>>> +		node = rb_first_cached(&local->active_txqs[ac]);
>>>>>>> +		first = true;
>>>>>>> +	} else
>>>>>>> +		node = rb_next(node);
>>>>>> 
>>>>>> Consider below piece of code from ath10k_mac_schedule_txq:
>>>>>> 
>>>>>>          ieee80211_txq_schedule_start(hw, ac);
>>>>>>          while ((txq = ieee80211_next_txq(hw, ac))) {
>>>>>>                  while (ath10k_mac_tx_can_push(hw, txq)) {
>>>>>>                          ret = ath10k_mac_tx_push_txq(hw, txq);
>>>>>>                          if (ret < 0)
>>>>>>                                  break;
>>>>>>                  }
>>>>>>                  ieee80211_return_txq(hw, txq);
>>>>>>                  ath10k_htt_tx_txq_update(hw, txq);
>>>>>>                  if (ret == -EBUSY)
>>>>>>                          break;
>>>>>>          }
>>>>>>          ieee80211_txq_schedule_end(hw, ac);
>>>>>> 
>>>>>> If my understanding is right, local->schedule_pos is used to 
>>>>>> record
>>>>>> the
>>>>>> last scheduled node and used for traversal rbtree for valid txq.
>>>>>> There
>>>>>> is chance that an empty txq is feeded to return_txq and got 
>>>>>> removed
>>>>>> from
>>>>>> rbtree. The empty txq will always be the rb_first node. Then in 
>>>>>> the
>>>>>> following next_txq, local->schedule_pos becomes meaningless since
>>>>>> its
>>>>>> rb_next will return NULL and the loop break. Only rb_first get
>>>>>> dequeued
>>>>>> during this loop.
>>>>>> 
>>>>>> 	if (!node || RB_EMPTY_NODE(node)) {
>>>>>> 		node = rb_first_cached(&local->active_txqs[ac]);
>>>>>> 		first = true;
>>>>>> 	} else
>>>>>> 		node = rb_next(node);
>>>>> 
>>>>> Ah, I see what you mean. Yes, that would indeed be a problem - nice
>>>>> catch! :)
>>>>> 
>>>>>> How about this? The nodes on the rbtree will be dequeued and 
>>>>>> removed
>>>>>> from rbtree one by one until HW is busy. Please note local vt and
>>>>>> sta
>>>>>> vt will not be updated since txq lock is held during this time.
>>>>> 
>>>>> Insertion and removal from the rbtree are relatively expensive, so
>>>>> I'd
>>>>> rather not do that for every txq. I think a better way to solve 
>>>>> this
>>>>> is to just defer the actual removal from the tree until
>>>>> ieee80211_txq_schedule_end()... Will fix that when I submit this
>>>>> again.
>>>> 
>>>> Do you mean we keep the empty txqs in the rbtree until loop finishes
>>>> and
>>>> remove them in ieee80211_txq_schedule_end(may be put return_txq in
>>>> it)?
>>>> If it is the case, I suppose a list is needed to store the empty 
>>>> txqs
>>>> so
>>>> as to dequeue them in ieee80211_txq_schedule_end.
>>> 
>>> Yeah, return_txq() would just put "to be removed" TXQs on a list, and
>>> schedule_end() would do the actual removal (after checking whether a
>>> new
>>> packet showed up in the meantime).
>> 
>> SGTM
>> 
>>> 
>>>> And one more thing,
>>>> 
>>>>> +               if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) 
>>>>> {
>>>>> +                       if (first)
>>>>> +                               local->airtime_v_t[ac] =
>>>>> sta->airtime[ac].v_t;
>>>>> +                       else
>>>>> +                               return NULL;
>>>> 
>>>> As local->airtime_v_t will not be updated during loop, we don't need
>>>> to
>>>> return NULL.
>>> 
>>> Yes we do; this is actually the break condition. I.e., stations whose
>>> virtual time are higher than the global time (in local->airtime_v_t)
>>> are
>>> not allowed to transmit. And since we are traversing them in order,
>>> when
>>> we find the first such station, we are done and can break out of the
>>> scheduling loop entirely (which is what we do by returning NULL). The
>>> other branch in the inner if() is just for the case where no stations
>>> are currently eligible to transmit according to this rule; here we
>>> don't
>>> want to stall, so we advance the global timer so the first station
>>> becomes eligible...
>> 
>> Yes,the inner if() make sure first node always get scheduled no matter
>> its vt.
>> 
>> To detail my concern, let's assume only two nodes in the tree and
>> empty nodes will be in tree until schedule_end(). In the loop and in
>> case hw is not busy, ath10k will drain every node next_txq returned
>> before asking for another txq again. Then as we are traversing to next
>> rb node, it is highly possible the second node is not allowed to
>> transmit since the global time has not been updated yet as the active
>> txq lock is held. At this time, only second node on the tree has data
>> and hw is capable of sending more data. I don't think the second node
>> is not valid for transmission in this situation.
>> 
>> With more nodes in the tree in this situation, I think same thing
>> happens that all nodes except the first node are not allowed to
>> transmit since none of their vts are less than the global time which
>> is not updated in time. The loop breaks when we are checking the
>> second node.
> 
> Yeah, in many cases we will end up throttling all but the first (couple
> of) node(s). This is by design; otherwise we can't ensure fairness. As
> long as we are making forward progress that is fine, though...
> 
> -Toke
Toke Høiland-Jørgensen April 20, 2019, 9:15 p.m. UTC | #23
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>> 
>>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>> 
>>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>> 
>>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> This switches the airtime scheduler in mac80211 to use a 
>>>>>>>>>> virtual
>>>>>>>>>> time-based
>>>>>>>>>> scheduler instead of the round-robin scheduler used before. 
>>>>>>>>>> This
>>>>>>>>>> has
>>>>>>>>>> a
>>>>>>>>>> couple of advantages:
>>>>>>>>>> 
>>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>>> firmware/hardware
>>>>>>>>>> with
>>>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>>>> 
>>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>>> schedule
>>>>>>>>>> both of
>>>>>>>>>>   them; no need to hard-block the scheduling rotation until the
>>>>>>>>>> head
>>>>>>>>>> of
>>>>>>>>>> the
>>>>>>>>>>   queue has used up its quantum.
>>>>>>>>>> 
>>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>>> becomes
>>>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>>> 
>>>>>>>>>> The drawback is that scheduling becomes slightly more 
>>>>>>>>>> expensive,
>>>>>>>>>> as
>>>>>>>>>> we
>>>>>>>>>> need
>>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This 
>>>>>>>>>> means
>>>>>>>>>> that
>>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>>> currently
>>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows too
>>>>>>>>>> big
>>>>>>>>>> (it's
>>>>>>>>>> only TXQs currently backlogged, not all associated stations), 
>>>>>>>>>> so
>>>>>>>>>> it
>>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>>> 
>>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>>>  {
>>>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct sta_info,
>>>>>>>>>> sta);
>>>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>>> -	u32 airtime = 0;
>>>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>>>> +
>>>>>>>>>> +	if (!txq)
>>>>>>>>>> +		return;
>>>>>>>>>> 
>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>>>  		airtime += rx_airtime;
>>>>>>>>>> 
>>>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>>>> +	airtime <<= 8;
>>>>>>>>>> +
>>>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>> +
>>>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>>>> +
>>>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>>> sta->airtime_weight;
>>>>>>>>>> +
>>>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
> Hi Toke,
>
> I was porting this version of ATF design to my ath10k platform and found 
> my old kernel version not supporting 64bit division. I'm wondering if it 
> is necessary to use u64 for airtime and weight_sum here though I can 
> find a solution for it. I think u32 might be enough. For airtime, 
> u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 us 
> = 130490, meaning we can support more than 130000 nodes with airtime 
> weight 8092 us.

As Felix said, we don't really want divides in the fast path at all. And
since the divisors are constant, we should be able to just pre-compute
reciprocals and turn the whole thing into multiplications...

> Another finding was when I configured two 11ac STAs with different
> airtime weight, such as 256 and 1024 meaning ratio is 1:4, the
> throughput ratio was not roughly matching the ratio. Could you please
> share your results? I am not sure if it is due to platform difference.

Hmm, I tested them with ath9k where things seemed to work equivalently
to the DRR. Are you testing the same hardware with that? Would be a good
baseline.

I am on vacation until the end of the month, but can share my actual
test results once I get back...

-Toke
Yibo Zhao April 30, 2019, 9:45 a.m. UTC | #24
On 2019-04-21 05:15, Toke Høiland-Jørgensen wrote:
> Yibo Zhao <yiboz@codeaurora.org> writes:
> 
>> On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>> 
>>>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>> 
>>>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>> 
>>>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>>> 
>>>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>> This switches the airtime scheduler in mac80211 to use a
>>>>>>>>>>> virtual
>>>>>>>>>>> time-based
>>>>>>>>>>> scheduler instead of the round-robin scheduler used before.
>>>>>>>>>>> This
>>>>>>>>>>> has
>>>>>>>>>>> a
>>>>>>>>>>> couple of advantages:
>>>>>>>>>>> 
>>>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>>>> firmware/hardware
>>>>>>>>>>> with
>>>>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>>>>> 
>>>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>>>> schedule
>>>>>>>>>>> both of
>>>>>>>>>>>   them; no need to hard-block the scheduling rotation until 
>>>>>>>>>>> the
>>>>>>>>>>> head
>>>>>>>>>>> of
>>>>>>>>>>> the
>>>>>>>>>>>   queue has used up its quantum.
>>>>>>>>>>> 
>>>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>>>> becomes
>>>>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>>>> 
>>>>>>>>>>> The drawback is that scheduling becomes slightly more
>>>>>>>>>>> expensive,
>>>>>>>>>>> as
>>>>>>>>>>> we
>>>>>>>>>>> need
>>>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This
>>>>>>>>>>> means
>>>>>>>>>>> that
>>>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>>>> currently
>>>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows 
>>>>>>>>>>> too
>>>>>>>>>>> big
>>>>>>>>>>> (it's
>>>>>>>>>>> only TXQs currently backlogged, not all associated stations),
>>>>>>>>>>> so
>>>>>>>>>>> it
>>>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>>>> 
>>>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>>>>  {
>>>>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct 
>>>>>>>>>>> sta_info,
>>>>>>>>>>> sta);
>>>>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>>>> -	u32 airtime = 0;
>>>>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>>>>> +
>>>>>>>>>>> +	if (!txq)
>>>>>>>>>>> +		return;
>>>>>>>>>>> 
>>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>>>>  		airtime += rx_airtime;
>>>>>>>>>>> 
>>>>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>>>>> +	airtime <<= 8;
>>>>>>>>>>> +
>>>>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>>> +
>>>>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>>>>> +
>>>>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>>>> sta->airtime_weight;
>>>>>>>>>>> +
>>>>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>> Hi Toke,
>> 
>> I was porting this version of ATF design to my ath10k platform and 
>> found
>> my old kernel version not supporting 64bit division. I'm wondering if 
>> it
>> is necessary to use u64 for airtime and weight_sum here though I can
>> find a solution for it. I think u32 might be enough. For airtime,
>> u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 
>> us
>> = 130490, meaning we can support more than 130000 nodes with airtime
>> weight 8092 us.
> 
> As Felix said, we don't really want divides in the fast path at all. 
> And
> since the divisors are constant, we should be able to just pre-compute
> reciprocals and turn the whole thing into multiplications...
> 
>> Another finding was when I configured two 11ac STAs with different
>> airtime weight, such as 256 and 1024 meaning ratio is 1:4, the
>> throughput ratio was not roughly matching the ratio. Could you please
>> share your results? I am not sure if it is due to platform difference.
> 
> Hmm, I tested them with ath9k where things seemed to work equivalently
> to the DRR. Are you testing the same hardware with that? Would be a 
> good
> baseline.
> 
> I am on vacation until the end of the month, but can share my actual
> test results once I get back...
Hi Toke,
I saw your commit in hostapd in
http://patchwork.ozlabs.org/patch/1059334/

For dynamic and limit mode described in above hostapd patch, do I need 
to change any code in this kernel patch or any other patches am I 
missing?

After a quick look at the hostapd patch, I guess all the efforts for 
both modes are done in hostapd. Correct me if I am wrong. :)
> 
> -Toke
Toke Høiland-Jørgensen April 30, 2019, 10:39 a.m. UTC | #25
Yibo Zhao <yiboz@codeaurora.org> writes:

> On 2019-04-21 05:15, Toke Høiland-Jørgensen wrote:
>> Yibo Zhao <yiboz@codeaurora.org> writes:
>> 
>>> On 2019-04-11 19:24, Toke Høiland-Jørgensen wrote:
>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>> 
>>>>> On 2019-04-10 18:40, Toke Høiland-Jørgensen wrote:
>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>> 
>>>>>>> On 2019-04-10 04:41, Toke Høiland-Jørgensen wrote:
>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>> 
>>>>>>>>> On 2019-04-04 16:31, Toke Høiland-Jørgensen wrote:
>>>>>>>>>> Yibo Zhao <yiboz@codeaurora.org> writes:
>>>>>>>>>> 
>>>>>>>>>>> On 2019-02-16 01:05, Toke Høiland-Jørgensen wrote:
>>>>>>>>>>>> This switches the airtime scheduler in mac80211 to use a
>>>>>>>>>>>> virtual
>>>>>>>>>>>> time-based
>>>>>>>>>>>> scheduler instead of the round-robin scheduler used before.
>>>>>>>>>>>> This
>>>>>>>>>>>> has
>>>>>>>>>>>> a
>>>>>>>>>>>> couple of advantages:
>>>>>>>>>>>> 
>>>>>>>>>>>> - No need to sync up the round-robin scheduler in
>>>>>>>>>>>> firmware/hardware
>>>>>>>>>>>> with
>>>>>>>>>>>>   the round-robin airtime scheduler.
>>>>>>>>>>>> 
>>>>>>>>>>>> - If several stations are eligible for transmission we can
>>>>>>>>>>>> schedule
>>>>>>>>>>>> both of
>>>>>>>>>>>>   them; no need to hard-block the scheduling rotation until 
>>>>>>>>>>>> the
>>>>>>>>>>>> head
>>>>>>>>>>>> of
>>>>>>>>>>>> the
>>>>>>>>>>>>   queue has used up its quantum.
>>>>>>>>>>>> 
>>>>>>>>>>>> - The check of whether a station is eligible for transmission
>>>>>>>>>>>> becomes
>>>>>>>>>>>>   simpler (in ieee80211_txq_may_transmit()).
>>>>>>>>>>>> 
>>>>>>>>>>>> The drawback is that scheduling becomes slightly more
>>>>>>>>>>>> expensive,
>>>>>>>>>>>> as
>>>>>>>>>>>> we
>>>>>>>>>>>> need
>>>>>>>>>>>> to maintain an rbtree of TXQs sorted by virtual time. This
>>>>>>>>>>>> means
>>>>>>>>>>>> that
>>>>>>>>>>>> ieee80211_register_airtime() becomes O(logN) in the number of
>>>>>>>>>>>> currently
>>>>>>>>>>>> scheduled TXQs. However, hopefully this number rarely grows 
>>>>>>>>>>>> too
>>>>>>>>>>>> big
>>>>>>>>>>>> (it's
>>>>>>>>>>>> only TXQs currently backlogged, not all associated stations),
>>>>>>>>>>>> so
>>>>>>>>>>>> it
>>>>>>>>>>>> shouldn't be too big of an issue.
>>>>>>>>>>>> 
>>>>>>>>>>>> @@ -1831,18 +1830,32 @@ void
>>>>>>>>>>>> ieee80211_sta_register_airtime(struct
>>>>>>>>>>>> ieee80211_sta *pubsta, u8 tid,
>>>>>>>>>>>>  {
>>>>>>>>>>>>  	struct sta_info *sta = container_of(pubsta, struct 
>>>>>>>>>>>> sta_info,
>>>>>>>>>>>> sta);
>>>>>>>>>>>>  	struct ieee80211_local *local = sta->sdata->local;
>>>>>>>>>>>> +	struct ieee80211_txq *txq = sta->sta.txq[tid];
>>>>>>>>>>>>  	u8 ac = ieee80211_ac_from_tid(tid);
>>>>>>>>>>>> -	u32 airtime = 0;
>>>>>>>>>>>> +	u64 airtime = 0, weight_sum;
>>>>>>>>>>>> +
>>>>>>>>>>>> +	if (!txq)
>>>>>>>>>>>> +		return;
>>>>>>>>>>>> 
>>>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_TX)
>>>>>>>>>>>>  		airtime += tx_airtime;
>>>>>>>>>>>>  	if (sta->local->airtime_flags & AIRTIME_USE_RX)
>>>>>>>>>>>>  		airtime += rx_airtime;
>>>>>>>>>>>> 
>>>>>>>>>>>> +	/* Weights scale so the unit weight is 256 */
>>>>>>>>>>>> +	airtime <<= 8;
>>>>>>>>>>>> +
>>>>>>>>>>>>  	spin_lock_bh(&local->active_txq_lock[ac]);
>>>>>>>>>>>> +
>>>>>>>>>>>>  	sta->airtime[ac].tx_airtime += tx_airtime;
>>>>>>>>>>>>  	sta->airtime[ac].rx_airtime += rx_airtime;
>>>>>>>>>>>> -	sta->airtime[ac].deficit -= airtime;
>>>>>>>>>>>> +
>>>>>>>>>>>> +	weight_sum = local->airtime_weight_sum[ac] ?:
>>>>>>>>>>>> sta->airtime_weight;
>>>>>>>>>>>> +
>>>>>>>>>>>> +	local->airtime_v_t[ac] += airtime / weight_sum;
>>> Hi Toke,
>>> 
>>> I was porting this version of ATF design to my ath10k platform and 
>>> found
>>> my old kernel version not supporting 64bit division. I'm wondering if 
>>> it
>>> is necessary to use u64 for airtime and weight_sum here though I can
>>> find a solution for it. I think u32 might be enough. For airtime,
>>> u32_max / 256 = 7182219 us(718 ms). As for weight_sum, u32_max / 8092 
>>> us
>>> = 130490, meaning we can support more than 130000 nodes with airtime
>>> weight 8092 us.
>> 
>> As Felix said, we don't really want divides in the fast path at all. 
>> And
>> since the divisors are constant, we should be able to just pre-compute
>> reciprocals and turn the whole thing into multiplications...
>> 
>>> Another finding was when I configured two 11ac STAs with different
>>> airtime weight, such as 256 and 1024 meaning ratio is 1:4, the
>>> throughput ratio was not roughly matching the ratio. Could you please
>>> share your results? I am not sure if it is due to platform difference.
>> 
>> Hmm, I tested them with ath9k where things seemed to work equivalently
>> to the DRR. Are you testing the same hardware with that? Would be a 
>> good
>> baseline.
>> 
>> I am on vacation until the end of the month, but can share my actual
>> test results once I get back...
> Hi Toke,
> I saw your commit in hostapd in
> http://patchwork.ozlabs.org/patch/1059334/
>
> For dynamic and limit mode described in above hostapd patch, do I need 
> to change any code in this kernel patch or any other patches am I 
> missing?

Nope, the kernel just exposes the API to set weights, hostapd does
everything else :)

> After a quick look at the hostapd patch, I guess all the efforts for 
> both modes are done in hostapd. Correct me if I am wrong. :)

You are quite right!

-Toke
diff mbox series

Patch

diff --git a/net/mac80211/debugfs.c b/net/mac80211/debugfs.c
index 343ad0a915e4..93b9ed2f9451 100644
--- a/net/mac80211/debugfs.c
+++ b/net/mac80211/debugfs.c
@@ -150,6 +150,47 @@  static const struct file_operations aqm_ops = {
 	.llseek = default_llseek,
 };
 
+static ssize_t airtime_read(struct file *file,
+			    char __user *user_buf,
+			    size_t count,
+			    loff_t *ppos)
+{
+	struct ieee80211_local *local = file->private_data;
+	char buf[200];
+	u64 v_t[IEEE80211_NUM_ACS];
+	u64 wt[IEEE80211_NUM_ACS];
+	int len = 0, ac;
+
+	for (ac = 0; ac < IEEE80211_NUM_ACS; ac++) {
+		spin_lock_bh(&local->active_txq_lock[ac]);
+		v_t[ac] = local->airtime_v_t[ac];
+		wt[ac] = local->airtime_weight_sum[ac];
+		spin_unlock_bh(&local->active_txq_lock[ac]);
+	}
+	len = scnprintf(buf, sizeof(buf),
+			"\tVO         VI         BE         BK\n"
+			"Virt-t\t%-10llu %-10llu %-10llu %-10llu\n"
+			"Weight\t%-10llu %-10llu %-10llu %-10llu\n",
+			v_t[0],
+			v_t[1],
+			v_t[2],
+			v_t[3],
+			wt[0],
+			wt[1],
+			wt[2],
+			wt[3]);
+
+	return simple_read_from_buffer(user_buf, count, ppos,
+				       buf, len);
+}
+
+static const struct file_operations airtime_ops = {
+	.read = airtime_read,
+	.open = simple_open,
+	.llseek = default_llseek,
+};
+
+
 #ifdef CONFIG_PM
 static ssize_t reset_write(struct file *file, const char __user *user_buf,
 			   size_t count, loff_t *ppos)
@@ -384,8 +425,12 @@  void debugfs_hw_add(struct ieee80211_local *local)
 	if (local->ops->wake_tx_queue)
 		DEBUGFS_ADD_MODE(aqm, 0600);
 
-	debugfs_create_u16("airtime_flags", 0600,
-			   phyd, &local->airtime_flags);
+	if (wiphy_ext_feature_isset(local->hw.wiphy,
+				    NL80211_EXT_FEATURE_AIRTIME_FAIRNESS)) {
+		DEBUGFS_ADD_MODE(airtime, 0600);
+		debugfs_create_u16("airtime_flags", 0600,
+				   phyd, &local->airtime_flags);
+	}
 
 	statsd = debugfs_create_dir("statistics", phyd);
 
diff --git a/net/mac80211/debugfs_sta.c b/net/mac80211/debugfs_sta.c
index 3aa618dcc58e..80028da27b98 100644
--- a/net/mac80211/debugfs_sta.c
+++ b/net/mac80211/debugfs_sta.c
@@ -203,7 +203,7 @@  static ssize_t sta_airtime_read(struct file *file, char __user *userbuf,
 	size_t bufsz = 200;
 	char *buf = kzalloc(bufsz, GFP_KERNEL), *p = buf;
 	u64 rx_airtime = 0, tx_airtime = 0;
-	s64 deficit[IEEE80211_NUM_ACS];
+	u64 v_t[IEEE80211_NUM_ACS];
 	ssize_t rv;
 	int ac;
 
@@ -214,20 +214,20 @@  static ssize_t sta_airtime_read(struct file *file, char __user *userbuf,
 		spin_lock_bh(&local->active_txq_lock[ac]);
 		rx_airtime += sta->airtime[ac].rx_airtime;
 		tx_airtime += sta->airtime[ac].tx_airtime;
-		deficit[ac] = sta->airtime[ac].deficit;
+		v_t[ac] = sta->airtime[ac].v_t;
 		spin_unlock_bh(&local->active_txq_lock[ac]);
 	}
 
 	p += scnprintf(p, bufsz + buf - p,
 		"RX: %llu us\nTX: %llu us\nWeight: %u\n"
-		"Deficit: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
+		"Virt-T: VO: %lld us VI: %lld us BE: %lld us BK: %lld us\n",
 		rx_airtime,
 		tx_airtime,
 		sta->airtime_weight,
-		deficit[0],
-		deficit[1],
-		deficit[2],
-		deficit[3]);
+		v_t[0],
+		v_t[1],
+		v_t[2],
+		v_t[3]);
 
 	rv = simple_read_from_buffer(userbuf, count, ppos, buf, p - buf);
 	kfree(buf);
@@ -245,7 +245,7 @@  static ssize_t sta_airtime_write(struct file *file, const char __user *userbuf,
 		spin_lock_bh(&local->active_txq_lock[ac]);
 		sta->airtime[ac].rx_airtime = 0;
 		sta->airtime[ac].tx_airtime = 0;
-		sta->airtime[ac].deficit = sta->airtime_weight;
+		sta->airtime[ac].v_t = 0;
 		spin_unlock_bh(&local->active_txq_lock[ac]);
 	}
 
diff --git a/net/mac80211/ieee80211_i.h b/net/mac80211/ieee80211_i.h
index 056b16bce3b0..994a70ee0ec0 100644
--- a/net/mac80211/ieee80211_i.h
+++ b/net/mac80211/ieee80211_i.h
@@ -840,8 +840,7 @@  struct txq_info {
 	struct codel_vars def_cvars;
 	struct codel_stats cstats;
 	struct sk_buff_head frags;
-	struct list_head schedule_order;
-	u16 schedule_round;
+	struct rb_node schedule_order;
 	unsigned long flags;
 
 	/* keep last! */
@@ -1135,8 +1134,10 @@  struct ieee80211_local {
 
 	/* protects active_txqs and txqi->schedule_order */
 	spinlock_t active_txq_lock[IEEE80211_NUM_ACS];
-	struct list_head active_txqs[IEEE80211_NUM_ACS];
-	u16 schedule_round[IEEE80211_NUM_ACS];
+	struct rb_root_cached active_txqs[IEEE80211_NUM_ACS];
+	struct rb_node *schedule_pos[IEEE80211_NUM_ACS];
+	u64 airtime_v_t[IEEE80211_NUM_ACS];
+	u64 airtime_weight_sum[IEEE80211_NUM_ACS];
 
 	u16 airtime_flags;
 
@@ -1765,6 +1766,12 @@  int ieee80211_tx_control_port(struct wiphy *wiphy, struct net_device *dev,
 			      const u8 *buf, size_t len,
 			      const u8 *dest, __be16 proto, bool unencrypted);
 
+void ieee80211_resort_txq(struct ieee80211_hw *hw,
+			  struct ieee80211_txq *txq);
+void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
+			      struct ieee80211_txq *txq);
+
+
 /* HT */
 void ieee80211_apply_htcap_overrides(struct ieee80211_sub_if_data *sdata,
 				     struct ieee80211_sta_ht_cap *ht_cap);
diff --git a/net/mac80211/main.c b/net/mac80211/main.c
index 71005b6dfcd1..f207bc284921 100644
--- a/net/mac80211/main.c
+++ b/net/mac80211/main.c
@@ -666,7 +666,7 @@  struct ieee80211_hw *ieee80211_alloc_hw_nm(size_t priv_data_len,
 	spin_lock_init(&local->queue_stop_reason_lock);
 
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
-		INIT_LIST_HEAD(&local->active_txqs[i]);
+		local->active_txqs[i] = RB_ROOT_CACHED;
 		spin_lock_init(&local->active_txq_lock[i]);
 	}
 	local->airtime_flags = AIRTIME_USE_TX | AIRTIME_USE_RX;
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 11f058987a54..9d01fdd86e2d 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -389,7 +389,6 @@  struct sta_info *sta_info_alloc(struct ieee80211_sub_if_data *sdata,
 	for (i = 0; i < IEEE80211_NUM_ACS; i++) {
 		skb_queue_head_init(&sta->ps_tx_buf[i]);
 		skb_queue_head_init(&sta->tx_filtered[i]);
-		sta->airtime[i].deficit = sta->airtime_weight;
 	}
 
 	for (i = 0; i < IEEE80211_NUM_TIDS; i++)
@@ -1831,18 +1830,32 @@  void ieee80211_sta_register_airtime(struct ieee80211_sta *pubsta, u8 tid,
 {
 	struct sta_info *sta = container_of(pubsta, struct sta_info, sta);
 	struct ieee80211_local *local = sta->sdata->local;
+	struct ieee80211_txq *txq = sta->sta.txq[tid];
 	u8 ac = ieee80211_ac_from_tid(tid);
-	u32 airtime = 0;
+	u64 airtime = 0, weight_sum;
+
+	if (!txq)
+		return;
 
 	if (sta->local->airtime_flags & AIRTIME_USE_TX)
 		airtime += tx_airtime;
 	if (sta->local->airtime_flags & AIRTIME_USE_RX)
 		airtime += rx_airtime;
 
+	/* Weights scale so the unit weight is 256 */
+	airtime <<= 8;
+
 	spin_lock_bh(&local->active_txq_lock[ac]);
+
 	sta->airtime[ac].tx_airtime += tx_airtime;
 	sta->airtime[ac].rx_airtime += rx_airtime;
-	sta->airtime[ac].deficit -= airtime;
+
+	weight_sum = local->airtime_weight_sum[ac] ?: sta->airtime_weight;
+
+	local->airtime_v_t[ac] += airtime / weight_sum;
+	sta->airtime[ac].v_t += airtime / sta->airtime_weight;
+	ieee80211_resort_txq(&local->hw, txq);
+
 	spin_unlock_bh(&local->active_txq_lock[ac]);
 }
 EXPORT_SYMBOL(ieee80211_sta_register_airtime);
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 05647d835894..4b901a2a998e 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -130,11 +130,12 @@  enum ieee80211_agg_stop_reason {
 /* Debugfs flags to enable/disable use of RX/TX airtime in scheduler */
 #define AIRTIME_USE_TX		BIT(0)
 #define AIRTIME_USE_RX		BIT(1)
+#define AIRTIME_GRACE 500 /* usec of grace period before reset */
 
 struct airtime_info {
 	u64 rx_airtime;
 	u64 tx_airtime;
-	s64 deficit;
+	u64 v_t;
 };
 
 struct sta_info;
diff --git a/net/mac80211/tx.c b/net/mac80211/tx.c
index 61c7ea9de2cc..8e125df8ade0 100644
--- a/net/mac80211/tx.c
+++ b/net/mac80211/tx.c
@@ -1449,7 +1449,7 @@  void ieee80211_txq_init(struct ieee80211_sub_if_data *sdata,
 	codel_vars_init(&txqi->def_cvars);
 	codel_stats_init(&txqi->cstats);
 	__skb_queue_head_init(&txqi->frags);
-	INIT_LIST_HEAD(&txqi->schedule_order);
+	RB_CLEAR_NODE(&txqi->schedule_order);
 
 	txqi->txq.vif = &sdata->vif;
 
@@ -1493,9 +1493,7 @@  void ieee80211_txq_purge(struct ieee80211_local *local,
 	ieee80211_purge_tx_queue(&local->hw, &txqi->frags);
 	spin_unlock_bh(&fq->lock);
 
-	spin_lock_bh(&local->active_txq_lock[txqi->txq.ac]);
-	list_del_init(&txqi->schedule_order);
-	spin_unlock_bh(&local->active_txq_lock[txqi->txq.ac]);
+	ieee80211_unschedule_txq(&local->hw, &txqi->txq);
 }
 
 void ieee80211_txq_set_params(struct ieee80211_local *local)
@@ -3640,126 +3638,191 @@  EXPORT_SYMBOL(ieee80211_tx_dequeue);
 struct ieee80211_txq *ieee80211_next_txq(struct ieee80211_hw *hw, u8 ac)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
+	struct rb_node *node = local->schedule_pos[ac];
 	struct txq_info *txqi = NULL;
+	bool first = false;
 
 	lockdep_assert_held(&local->active_txq_lock[ac]);
 
- begin:
-	txqi = list_first_entry_or_null(&local->active_txqs[ac],
-					struct txq_info,
-					schedule_order);
-	if (!txqi)
+	if (!node) {
+		node = rb_first_cached(&local->active_txqs[ac]);
+		first = true;
+	} else
+		node = rb_next(node);
+
+	if (!node)
 		return NULL;
 
+	txqi = container_of(node, struct txq_info, schedule_order);
+
 	if (txqi->txq.sta) {
 		struct sta_info *sta = container_of(txqi->txq.sta,
 						struct sta_info, sta);
 
-		if (sta->airtime[txqi->txq.ac].deficit < 0) {
-			sta->airtime[txqi->txq.ac].deficit +=
-				sta->airtime_weight;
-			list_move_tail(&txqi->schedule_order,
-				       &local->active_txqs[txqi->txq.ac]);
-			goto begin;
+		if (sta->airtime[ac].v_t > local->airtime_v_t[ac]) {
+			if (first)
+				local->airtime_v_t[ac] = sta->airtime[ac].v_t;
+			else
+				return NULL;
 		}
 	}
 
 
-	if (txqi->schedule_round == local->schedule_round[ac])
-		return NULL;
-
-	list_del_init(&txqi->schedule_order);
-	txqi->schedule_round = local->schedule_round[ac];
+	local->schedule_pos[ac] = node;
 	return &txqi->txq;
 }
 EXPORT_SYMBOL(ieee80211_next_txq);
 
-void ieee80211_return_txq(struct ieee80211_hw *hw,
+static void __ieee80211_insert_txq(struct rb_root_cached *root,
+				   struct txq_info *txqi, u8 ac)
+{
+	struct rb_node **new = &root->rb_root.rb_node;
+	struct rb_node *parent = NULL;
+	struct txq_info *__txqi;
+	bool leftmost = true;
+
+	while (*new) {
+		parent = *new;
+		__txqi = rb_entry(parent, struct txq_info, schedule_order);
+
+		if (!txqi->txq.sta) {
+			/* new txqi has no sta - insert to the left */
+			new = &parent->rb_left;
+		} else if (!__txqi->txq.sta) {
+			/* existing txqi has no sta - insert to the right */
+			new = &parent->rb_right;
+			leftmost = false;
+		} else {
+			struct sta_info *old_sta = container_of(__txqi->txq.sta,
+								struct sta_info,
+								sta);
+			struct sta_info *new_sta = container_of(txqi->txq.sta,
+								struct sta_info,
+								sta);
+
+			if (new_sta->airtime[ac].v_t <= old_sta->airtime[ac].v_t)
+				new = &parent->rb_left;
+			else {
+				new = &parent->rb_right;
+				leftmost = false;
+			}
+
+		}
+	}
+
+	rb_link_node(&txqi->schedule_order, parent, new);
+	rb_insert_color_cached(&txqi->schedule_order, root, leftmost);
+}
+
+void ieee80211_schedule_txq(struct ieee80211_hw *hw,
+			    struct ieee80211_txq *txq)
+	__acquires(txq_lock) __releases(txq_lock)
+{
+	struct ieee80211_local *local = hw_to_local(hw);
+	struct txq_info *txqi = to_txq_info(txq);
+	u8 ac = txq->ac;
+
+	spin_lock_bh(&local->active_txq_lock[ac]);
+
+	if (!RB_EMPTY_NODE(&txqi->schedule_order))
+		goto out;
+
+	if (txq->sta) {
+		struct sta_info *sta = container_of(txq->sta,
+						    struct sta_info, sta);
+
+		local->airtime_weight_sum[ac] += sta->airtime_weight;
+		if (local->airtime_v_t[ac] > AIRTIME_GRACE)
+			sta->airtime[ac].v_t = max(local->airtime_v_t[ac] - AIRTIME_GRACE,
+						   sta->airtime[ac].v_t);
+	}
+
+	__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
+
+ out:
+	spin_unlock_bh(&local->active_txq_lock[ac]);
+}
+EXPORT_SYMBOL(ieee80211_schedule_txq);
+
+void ieee80211_resort_txq(struct ieee80211_hw *hw,
 			  struct ieee80211_txq *txq)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
 	struct txq_info *txqi = to_txq_info(txq);
+	u8 ac = txq->ac;
+
+	if (!RB_EMPTY_NODE(&txqi->schedule_order)) {
+		rb_erase_cached(&txqi->schedule_order,
+				&local->active_txqs[ac]);
+		RB_CLEAR_NODE(&txqi->schedule_order);
+		__ieee80211_insert_txq(&local->active_txqs[ac], txqi, ac);
+	}
+}
+
+static void __ieee80211_unschedule_txq(struct ieee80211_hw *hw,
+				       struct ieee80211_txq *txq)
+{
+	struct ieee80211_local *local = hw_to_local(hw);
+	struct txq_info *txqi = to_txq_info(txq);
+	u8 ac = txq->ac;
 
 	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
 
-	if (list_empty(&txqi->schedule_order) &&
-	    (!skb_queue_empty(&txqi->frags) || txqi->tin.backlog_packets)) {
-		/* If airtime accounting is active, always enqueue STAs at the
-		 * head of the list to ensure that they only get moved to the
-		 * back by the airtime DRR scheduler once they have a negative
-		 * deficit. A station that already has a negative deficit will
-		 * get immediately moved to the back of the list on the next
-		 * call to ieee80211_next_txq().
-		 */
-		if (txqi->txq.sta &&
-		    wiphy_ext_feature_isset(local->hw.wiphy,
-					    NL80211_EXT_FEATURE_AIRTIME_FAIRNESS))
-			list_add(&txqi->schedule_order,
-				 &local->active_txqs[txq->ac]);
-		else
-			list_add_tail(&txqi->schedule_order,
-				      &local->active_txqs[txq->ac]);
+	if (RB_EMPTY_NODE(&txqi->schedule_order))
+		return;
+
+	if (txq->sta) {
+		struct sta_info *sta = container_of(txq->sta,
+						    struct sta_info, sta);
+
+		local->airtime_weight_sum[ac] -= sta->airtime_weight;
 	}
+
+	rb_erase_cached(&txqi->schedule_order,
+			&local->active_txqs[txq->ac]);
+	RB_CLEAR_NODE(&txqi->schedule_order);
 }
-EXPORT_SYMBOL(ieee80211_return_txq);
 
-void ieee80211_schedule_txq(struct ieee80211_hw *hw,
-			    struct ieee80211_txq *txq)
+void ieee80211_unschedule_txq(struct ieee80211_hw *hw,
+			      struct ieee80211_txq *txq)
 	__acquires(txq_lock) __releases(txq_lock)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
 
 	spin_lock_bh(&local->active_txq_lock[txq->ac]);
-	ieee80211_return_txq(hw, txq);
+	__ieee80211_unschedule_txq(hw, txq);
 	spin_unlock_bh(&local->active_txq_lock[txq->ac]);
 }
-EXPORT_SYMBOL(ieee80211_schedule_txq);
+
+void ieee80211_return_txq(struct ieee80211_hw *hw,
+			  struct ieee80211_txq *txq)
+{
+	struct ieee80211_local *local = hw_to_local(hw);
+	struct txq_info *txqi = to_txq_info(txq);
+
+	lockdep_assert_held(&local->active_txq_lock[txq->ac]);
+
+	if (!RB_EMPTY_NODE(&txqi->schedule_order) &&
+	    (skb_queue_empty(&txqi->frags) && !txqi->tin.backlog_packets))
+		__ieee80211_unschedule_txq(hw, txq);
+}
+EXPORT_SYMBOL(ieee80211_return_txq);
 
 bool ieee80211_txq_may_transmit(struct ieee80211_hw *hw,
 				struct ieee80211_txq *txq)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
-	struct txq_info *iter, *tmp, *txqi = to_txq_info(txq);
+	struct txq_info *txqi = to_txq_info(txq);
 	struct sta_info *sta;
 	u8 ac = txq->ac;
 
 	lockdep_assert_held(&local->active_txq_lock[ac]);
 
 	if (!txqi->txq.sta)
-		goto out;
-
-	if (list_empty(&txqi->schedule_order))
-		goto out;
-
-	list_for_each_entry_safe(iter, tmp, &local->active_txqs[ac],
-				 schedule_order) {
-		if (iter == txqi)
-			break;
-
-		if (!iter->txq.sta) {
-			list_move_tail(&iter->schedule_order,
-				       &local->active_txqs[ac]);
-			continue;
-		}
-		sta = container_of(iter->txq.sta, struct sta_info, sta);
-		if (sta->airtime[ac].deficit < 0)
-			sta->airtime[ac].deficit += sta->airtime_weight;
-		list_move_tail(&iter->schedule_order, &local->active_txqs[ac]);
-	}
+		return true;
 
 	sta = container_of(txqi->txq.sta, struct sta_info, sta);
-	if (sta->airtime[ac].deficit >= 0)
-		goto out;
-
-	sta->airtime[ac].deficit += sta->airtime_weight;
-	list_move_tail(&txqi->schedule_order, &local->active_txqs[ac]);
-
-	return false;
-out:
-	if (!list_empty(&txqi->schedule_order))
-		list_del_init(&txqi->schedule_order);
-
-	return true;
+	return (sta->airtime[ac].v_t <= local->airtime_v_t[ac]);
 }
 EXPORT_SYMBOL(ieee80211_txq_may_transmit);
 
@@ -3769,7 +3832,6 @@  void ieee80211_txq_schedule_start(struct ieee80211_hw *hw, u8 ac)
 	struct ieee80211_local *local = hw_to_local(hw);
 
 	spin_lock_bh(&local->active_txq_lock[ac]);
-	local->schedule_round[ac]++;
 }
 EXPORT_SYMBOL(ieee80211_txq_schedule_start);
 
@@ -3778,6 +3840,7 @@  void ieee80211_txq_schedule_end(struct ieee80211_hw *hw, u8 ac)
 {
 	struct ieee80211_local *local = hw_to_local(hw);
 
+	local->schedule_pos[ac] = NULL;
 	spin_unlock_bh(&local->active_txq_lock[ac]);
 }
 EXPORT_SYMBOL(ieee80211_txq_schedule_end);