Message ID | 20160804214940.78476-1-nbd@nbd.name (mailing list archive) |
---|---|
State | Accepted |
Commit | d94a461d7a7df68991fb9663531173f60ef89c68 |
Delegated to: | Kalle Valo |
Headers | show |
On 08/04/2016 11:49 PM, Felix Fietkau wrote: > It removes the need for undoing the padding changes to skb->data and it > improves performance by eliminating one tx status lookup per MPDU in the > status path. It is also useful for preparing a follow-up fix to better > handle powersave filtering. > For me, this one introduces a regression to the statistics, e.g. 'dot11TransmittedFragmentCount' is now accounted differently since it is not updated from within ieee80211_tx_status_noskb(). Cheers, Zefir -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 2016-08-11 18:05, Zefir Kurtisi wrote: > On 08/04/2016 11:49 PM, Felix Fietkau wrote: >> It removes the need for undoing the padding changes to skb->data and it >> improves performance by eliminating one tx status lookup per MPDU in the >> status path. It is also useful for preparing a follow-up fix to better >> handle powersave filtering. >> > > For me, this one introduces a regression to the statistics, e.g. > 'dot11TransmittedFragmentCount' is now accounted differently since it is not > updated from within ieee80211_tx_status_noskb(). Is this important? I guess it would be possible to make this more accurate by extending the API, but I wonder if that's worth doing just for these debugfs counters. - Felix -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 08/11/2016 09:27 PM, Felix Fietkau wrote: > On 2016-08-11 18:05, Zefir Kurtisi wrote: >> On 08/04/2016 11:49 PM, Felix Fietkau wrote: >>> It removes the need for undoing the padding changes to skb->data and it >>> improves performance by eliminating one tx status lookup per MPDU in the >>> status path. It is also useful for preparing a follow-up fix to better >>> handle powersave filtering. >>> >> >> For me, this one introduces a regression to the statistics, e.g. >> 'dot11TransmittedFragmentCount' is now accounted differently since it is not >> updated from within ieee80211_tx_status_noskb(). > Is this important? I guess it would be possible to make this more > accurate by extending the API, but I wonder if that's worth doing just > for these debugfs counters. > If you want to support the IEEE802dot11.MIB (dot11mac.dot11CountersTable), they are important. Not sure though if it is used by others besides us. I think the real issue here is that those counters are regarded as internal debug values (as the comments or e.g. commit c206ca6709 state) instead of general purpose statistics that is exposed to SNMP. As such, they should be configurable as a feature independent of debug settings in mac80211. Aside from that consideration, this patch (with the follow up ones) increased peak performance noticeably (we measure some 5%+ higher peak throughput), which for sure is worth dropping the counters for most. I am fine handling this internally. A note in the commit message would be helpful that states that counters dot11TransmittedFragmentCount, dot11FrameDuplicateCount, dot11ReceivedFragmentCount, and dot11MulticastReceivedFrameCount become invalid. Cheers, Zefir -- To unsubscribe from this list: send the line "unsubscribe linux-wireless" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Zefir Kurtisi <zefir.kurtisi@neratec.com> writes: > On 08/11/2016 09:27 PM, Felix Fietkau wrote: >> On 2016-08-11 18:05, Zefir Kurtisi wrote: >>> On 08/04/2016 11:49 PM, Felix Fietkau wrote: >>>> It removes the need for undoing the padding changes to skb->data and it >>>> improves performance by eliminating one tx status lookup per MPDU in the >>>> status path. It is also useful for preparing a follow-up fix to better >>>> handle powersave filtering. >>>> >>> >>> For me, this one introduces a regression to the statistics, e.g. >>> 'dot11TransmittedFragmentCount' is now accounted differently since it is not >>> updated from within ieee80211_tx_status_noskb(). >> Is this important? I guess it would be possible to make this more >> accurate by extending the API, but I wonder if that's worth doing just >> for these debugfs counters. >> > If you want to support the IEEE802dot11.MIB (dot11mac.dot11CountersTable), they > are important. Not sure though if it is used by others besides us. > > I think the real issue here is that those counters are regarded as internal debug > values (as the comments or e.g. commit c206ca6709 state) instead of general > purpose statistics that is exposed to SNMP. As such, they should be configurable > as a feature independent of debug settings in mac80211. > > > Aside from that consideration, this patch (with the follow up ones) increased peak > performance noticeably (we measure some 5%+ higher peak throughput), which for > sure is worth dropping the counters for most. > > I am fine handling this internally. A note in the commit message would be helpful > that states that counters dot11TransmittedFragmentCount, dot11FrameDuplicateCount, > dot11ReceivedFragmentCount, and dot11MulticastReceivedFrameCount become invalid. A good idea, I updated the commit log to mention that. Does that look ok? Author: Felix Fietkau <nbd@nbd.name> Date: Fri Sep 2 19:46:12 2016 +0300 ath9k: use ieee80211_tx_status_noskb where possible It removes the need for undoing the padding changes to skb->data and it improves performance by eliminating one tx status lookup per MPDU in the status path. It is also useful for preparing a follow-up fix to better handle powersave filtering. A side effect is that these counters, available via debugfs, become now invalid: * dot11TransmittedFragmentCount * dot11FrameDuplicateCount, * dot11ReceivedFragmentCount * dot11MulticastReceivedFrameCount Signed-off-by: Felix Fietkau <nbd@nbd.name> Patchwork-Id: 9264385 [kvalo@qca.qualcomm.com: add a note about counters, thanks to Zefir Kurtisi] Signed-off-by: Kalle Valo <kvalo@qca.qualcomm.com>
On 2016-09-08 13:41, Kalle Valo wrote: > Zefir Kurtisi <zefir.kurtisi@neratec.com> writes: > >> On 08/11/2016 09:27 PM, Felix Fietkau wrote: >>> On 2016-08-11 18:05, Zefir Kurtisi wrote: >>>> On 08/04/2016 11:49 PM, Felix Fietkau wrote: >>>>> It removes the need for undoing the padding changes to skb->data and it >>>>> improves performance by eliminating one tx status lookup per MPDU in the >>>>> status path. It is also useful for preparing a follow-up fix to better >>>>> handle powersave filtering. >>>>> >>>> >>>> For me, this one introduces a regression to the statistics, e.g. >>>> 'dot11TransmittedFragmentCount' is now accounted differently since it is not >>>> updated from within ieee80211_tx_status_noskb(). >>> Is this important? I guess it would be possible to make this more >>> accurate by extending the API, but I wonder if that's worth doing just >>> for these debugfs counters. >>> >> If you want to support the IEEE802dot11.MIB (dot11mac.dot11CountersTable), they >> are important. Not sure though if it is used by others besides us. >> >> I think the real issue here is that those counters are regarded as internal debug >> values (as the comments or e.g. commit c206ca6709 state) instead of general >> purpose statistics that is exposed to SNMP. As such, they should be configurable >> as a feature independent of debug settings in mac80211. >> >> >> Aside from that consideration, this patch (with the follow up ones) increased peak >> performance noticeably (we measure some 5%+ higher peak throughput), which for >> sure is worth dropping the counters for most. >> >> I am fine handling this internally. A note in the commit message would be helpful >> that states that counters dot11TransmittedFragmentCount, dot11FrameDuplicateCount, >> dot11ReceivedFragmentCount, and dot11MulticastReceivedFrameCount become invalid. > > A good idea, I updated the commit log to mention that. Does that look > ok? > > Author: Felix Fietkau <nbd@nbd.name> > Date: Fri Sep 2 19:46:12 2016 +0300 > > ath9k: use ieee80211_tx_status_noskb where possible > > It removes the need for undoing the padding changes to skb->data and it > improves performance by eliminating one tx status lookup per MPDU in the > status path. It is also useful for preparing a follow-up fix to better > handle powersave filtering. > > A side effect is that these counters, available via debugfs, become now invalid: > > * dot11TransmittedFragmentCount > * dot11FrameDuplicateCount, > * dot11ReceivedFragmentCount > * dot11MulticastReceivedFrameCount > > Signed-off-by: Felix Fietkau <nbd@nbd.name> Looks good to me. - Felix
Felix Fietkau <nbd@nbd.name> wrote: > It removes the need for undoing the padding changes to skb->data and it > improves performance by eliminating one tx status lookup per MPDU in the > status path. It is also useful for preparing a follow-up fix to better > handle powersave filtering. > > Signed-off-by: Felix Fietkau <nbd@nbd.name> Thanks, 2 patches applied to ath-next branch of ath.git: d94a461d7a7d ath9k: use ieee80211_tx_status_noskb where possible 315c457ff123 ath9k: improve powersave filter handling
diff --git a/drivers/net/wireless/ath/ath9k/xmit.c b/drivers/net/wireless/ath/ath9k/xmit.c index 39d9383..5693558 100644 --- a/drivers/net/wireless/ath/ath9k/xmit.c +++ b/drivers/net/wireless/ath/ath9k/xmit.c @@ -50,9 +50,11 @@ static u16 bits_per_symbol[][2] = { static void ath_tx_send_normal(struct ath_softc *sc, struct ath_txq *txq, struct ath_atx_tid *tid, struct sk_buff *skb); static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb, - int tx_flags, struct ath_txq *txq); + int tx_flags, struct ath_txq *txq, + struct ieee80211_sta *sta); static void ath_tx_complete_buf(struct ath_softc *sc, struct ath_buf *bf, struct ath_txq *txq, struct list_head *bf_q, + struct ieee80211_sta *sta, struct ath_tx_status *ts, int txok); static void ath_tx_txqaddbuf(struct ath_softc *sc, struct ath_txq *txq, struct list_head *head, bool internal); @@ -77,6 +79,22 @@ enum { /* Aggregation logic */ /*********************/ +static void ath_tx_status(struct ieee80211_hw *hw, struct sk_buff *skb) +{ + struct ieee80211_tx_info *info = IEEE80211_SKB_CB(skb); + struct ieee80211_sta *sta = info->status.status_driver_data[0]; + + if (info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS) { + ieee80211_tx_status(hw, skb); + return; + } + + if (sta) + ieee80211_tx_status_noskb(hw, sta, info); + + dev_kfree_skb(skb); +} + void ath_txq_lock(struct ath_softc *sc, struct ath_txq *txq) __acquires(&txq->axq_lock) { @@ -92,6 +110,7 @@ void ath_txq_unlock(struct ath_softc *sc, struct ath_txq *txq) void ath_txq_unlock_complete(struct ath_softc *sc, struct ath_txq *txq) __releases(&txq->axq_lock) { + struct ieee80211_hw *hw = sc->hw; struct sk_buff_head q; struct sk_buff *skb; @@ -100,7 +119,7 @@ void ath_txq_unlock_complete(struct ath_softc *sc, struct ath_txq *txq) spin_unlock_bh(&txq->axq_lock); while ((skb = __skb_dequeue(&q))) - ieee80211_tx_status(sc->hw, skb); + ath_tx_status(hw, skb); } static void ath_tx_queue_tid(struct ath_softc *sc, struct ath_txq *txq, @@ -268,7 +287,7 @@ static void ath_tx_flush_tid(struct ath_softc *sc, struct ath_atx_tid *tid) } list_add_tail(&bf->list, &bf_head); - ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0); + ath_tx_complete_buf(sc, bf, txq, &bf_head, NULL, &ts, 0); } if (sendbar) { @@ -333,12 +352,12 @@ static void ath_tid_drain(struct ath_softc *sc, struct ath_txq *txq, bf = fi->bf; if (!bf) { - ath_tx_complete(sc, skb, ATH_TX_ERROR, txq); + ath_tx_complete(sc, skb, ATH_TX_ERROR, txq, NULL); continue; } list_add_tail(&bf->list, &bf_head); - ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0); + ath_tx_complete_buf(sc, bf, txq, &bf_head, NULL, &ts, 0); } } @@ -441,12 +460,11 @@ static void ath_tx_count_frames(struct ath_softc *sc, struct ath_buf *bf, static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, struct ath_buf *bf, struct list_head *bf_q, + struct ieee80211_sta *sta, struct ath_tx_status *ts, int txok) { struct ath_node *an = NULL; struct sk_buff *skb; - struct ieee80211_sta *sta; - struct ieee80211_hw *hw = sc->hw; struct ieee80211_hdr *hdr; struct ieee80211_tx_info *tx_info; struct ath_atx_tid *tid = NULL; @@ -475,12 +493,7 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, for (i = 0; i < ts->ts_rateindex; i++) retries += rates[i].count; - rcu_read_lock(); - - sta = ieee80211_find_sta_by_ifaddr(hw, hdr->addr1, hdr->addr2); if (!sta) { - rcu_read_unlock(); - INIT_LIST_HEAD(&bf_head); while (bf) { bf_next = bf->bf_next; @@ -488,7 +501,7 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, if (!bf->bf_state.stale || bf_next != NULL) list_move_tail(&bf->list, &bf_head); - ath_tx_complete_buf(sc, bf, txq, &bf_head, ts, 0); + ath_tx_complete_buf(sc, bf, txq, &bf_head, NULL, ts, 0); bf = bf_next; } @@ -598,7 +611,7 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, ts); } - ath_tx_complete_buf(sc, bf, txq, &bf_head, ts, + ath_tx_complete_buf(sc, bf, txq, &bf_head, sta, ts, !txfail); } else { if (tx_info->flags & IEEE80211_TX_STATUS_EOSP) { @@ -619,7 +632,8 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, ath_tx_update_baw(sc, tid, seqno); ath_tx_complete_buf(sc, bf, txq, - &bf_head, ts, 0); + &bf_head, NULL, ts, + 0); bar_index = max_t(int, bar_index, ATH_BA_INDEX(seq_first, seqno)); break; @@ -663,8 +677,6 @@ static void ath_tx_complete_aggr(struct ath_softc *sc, struct ath_txq *txq, ath_txq_lock(sc, txq); } - rcu_read_unlock(); - if (needreset) ath9k_queue_reset(sc, RESET_TYPE_TX_ERROR); } @@ -679,7 +691,10 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq, struct ath_tx_status *ts, struct ath_buf *bf, struct list_head *bf_head) { + struct ieee80211_hw *hw = sc->hw; struct ieee80211_tx_info *info; + struct ieee80211_sta *sta; + struct ieee80211_hdr *hdr; bool txok, flush; txok = !(ts->ts_status & ATH9K_TXERR_MASK); @@ -692,6 +707,10 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq, ts->duration = ath9k_hw_get_duration(sc->sc_ah, bf->bf_desc, ts->ts_rateindex); + + hdr = (struct ieee80211_hdr *) bf->bf_mpdu->data; + sta = ieee80211_find_sta_by_ifaddr(hw, hdr->addr1, hdr->addr2); + if (!bf_isampdu(bf)) { if (!flush) { info = IEEE80211_SKB_CB(bf->bf_mpdu); @@ -700,9 +719,9 @@ static void ath_tx_process_buffer(struct ath_softc *sc, struct ath_txq *txq, ath_tx_rc_status(sc, bf, ts, 1, txok ? 0 : 1, txok); ath_dynack_sample_tx_ts(sc->sc_ah, bf->bf_mpdu, ts); } - ath_tx_complete_buf(sc, bf, txq, bf_head, ts, txok); + ath_tx_complete_buf(sc, bf, txq, bf_head, sta, ts, txok); } else - ath_tx_complete_aggr(sc, txq, bf, bf_head, ts, txok); + ath_tx_complete_aggr(sc, txq, bf, bf_head, sta, ts, txok); if (!flush) ath_txq_schedule(sc, txq); @@ -938,7 +957,7 @@ ath_tx_get_tid_subframe(struct ath_softc *sc, struct ath_txq *txq, list_add(&bf->list, &bf_head); __skb_unlink(skb, *q); ath_tx_update_baw(sc, tid, seqno); - ath_tx_complete_buf(sc, bf, txq, &bf_head, &ts, 0); + ath_tx_complete_buf(sc, bf, txq, &bf_head, NULL, &ts, 0); continue; } @@ -1847,6 +1866,7 @@ static void ath_drain_txq_list(struct ath_softc *sc, struct ath_txq *txq, */ void ath_draintxq(struct ath_softc *sc, struct ath_txq *txq) { + rcu_read_lock(); ath_txq_lock(sc, txq); if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA) { @@ -1865,6 +1885,7 @@ void ath_draintxq(struct ath_softc *sc, struct ath_txq *txq) ath_drain_txq_list(sc, txq, &txq->axq_q); ath_txq_unlock_complete(sc, txq); + rcu_read_unlock(); } bool ath_drain_all_txq(struct ath_softc *sc) @@ -2487,7 +2508,8 @@ void ath_tx_cabq(struct ieee80211_hw *hw, struct ieee80211_vif *vif, /*****************/ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb, - int tx_flags, struct ath_txq *txq) + int tx_flags, struct ath_txq *txq, + struct ieee80211_sta *sta) { struct ieee80211_tx_info *tx_info = IEEE80211_SKB_CB(skb); struct ath_common *common = ath9k_hw_common(sc->sc_ah); @@ -2507,15 +2529,17 @@ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb, tx_info->flags |= IEEE80211_TX_STAT_ACK; } - padpos = ieee80211_hdrlen(hdr->frame_control); - padsize = padpos & 3; - if (padsize && skb->len>padpos+padsize) { - /* - * Remove MAC header padding before giving the frame back to - * mac80211. - */ - memmove(skb->data + padsize, skb->data, padpos); - skb_pull(skb, padsize); + if (tx_info->flags & IEEE80211_TX_CTL_REQ_TX_STATUS) { + padpos = ieee80211_hdrlen(hdr->frame_control); + padsize = padpos & 3; + if (padsize && skb->len>padpos+padsize) { + /* + * Remove MAC header padding before giving the frame back to + * mac80211. + */ + memmove(skb->data + padsize, skb->data, padpos); + skb_pull(skb, padsize); + } } spin_lock_irqsave(&sc->sc_pm_lock, flags); @@ -2530,12 +2554,14 @@ static void ath_tx_complete(struct ath_softc *sc, struct sk_buff *skb, } spin_unlock_irqrestore(&sc->sc_pm_lock, flags); - __skb_queue_tail(&txq->complete_q, skb); ath_txq_skb_done(sc, txq, skb); + tx_info->status.status_driver_data[0] = sta; + __skb_queue_tail(&txq->complete_q, skb); } static void ath_tx_complete_buf(struct ath_softc *sc, struct ath_buf *bf, struct ath_txq *txq, struct list_head *bf_q, + struct ieee80211_sta *sta, struct ath_tx_status *ts, int txok) { struct sk_buff *skb = bf->bf_mpdu; @@ -2563,7 +2589,7 @@ static void ath_tx_complete_buf(struct ath_softc *sc, struct ath_buf *bf, complete(&sc->paprd_complete); } else { ath_debug_stat_tx(sc, bf, ts, txq, tx_flags); - ath_tx_complete(sc, skb, tx_flags, txq); + ath_tx_complete(sc, skb, tx_flags, txq, sta); } skip_tx_complete: /* At this point, skb (bf->bf_mpdu) is consumed...make sure we don't @@ -2715,10 +2741,12 @@ void ath_tx_tasklet(struct ath_softc *sc) u32 qcumask = ((1 << ATH9K_NUM_TX_QUEUES) - 1) & ah->intr_txqs; int i; + rcu_read_lock(); for (i = 0; i < ATH9K_NUM_TX_QUEUES; i++) { if (ATH_TXQ_SETUP(sc, i) && (qcumask & (1 << i))) ath_tx_processq(sc, &sc->tx.txq[i]); } + rcu_read_unlock(); } void ath_tx_edma_tasklet(struct ath_softc *sc) @@ -2732,6 +2760,7 @@ void ath_tx_edma_tasklet(struct ath_softc *sc) struct list_head *fifo_list; int status; + rcu_read_lock(); for (;;) { if (test_bit(ATH_OP_HW_RESET, &common->op_flags)) break; @@ -2802,6 +2831,7 @@ void ath_tx_edma_tasklet(struct ath_softc *sc) ath_tx_process_buffer(sc, txq, &ts, bf, &bf_head); ath_txq_unlock_complete(sc, txq); } + rcu_read_unlock(); } /*****************/
It removes the need for undoing the padding changes to skb->data and it improves performance by eliminating one tx status lookup per MPDU in the status path. It is also useful for preparing a follow-up fix to better handle powersave filtering. Signed-off-by: Felix Fietkau <nbd@nbd.name> --- drivers/net/wireless/ath/ath9k/xmit.c | 94 +++++++++++++++++++++++------------ 1 file changed, 62 insertions(+), 32 deletions(-)