Message ID | DB9PR01MB73541FED9E91AC3005D27DAEE4869@DB9PR01MB7354.eurprd01.prod.exchangelabs.com (mailing list archive) |
---|---|
State | RFC |
Delegated to: | Johannes Berg |
Headers | show |
Series | [RFC,v4] mac80211: fix rx blockack session race condition | expand |
On Thu, 2021-10-28 at 10:25 +0000, Jean-Pierre TOSONI wrote: > + spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock); > if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid) && > - !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg)) > + /* back_req is allowed if the fw just received addba */ > If we're going to add an unconditional lock here, I see little reason to have all the test_bit()/test_and_set_bit() etc. I really don't think it's _right_ to add an unconditional lock here though, if it can at all be avoided. johannes
> -----Message d'origine----- > De : Johannes Berg <johannes@sipsolutions.net> > Envoyé : vendredi 26 novembre 2021 13:05 > À : Jean-Pierre TOSONI <jp.tosoni@acksys.fr>; 'linux- > wireless@vger.kernel.org' <linux-wireless@vger.kernel.org> > Objet : Re: [RFC v4] mac80211: fix rx blockack session race condition > > On Thu, 2021-10-28 at 10:25 +0000, Jean-Pierre TOSONI wrote: > > > + spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock); > > if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid) > && > > - !test_and_set_bit(tid, rx->sta- > >ampdu_mlme.unexpected_agg)) > > + /* back_req is allowed if the fw just received addba */ > > > > If we're going to add an unconditional lock here, I see little reason to > have all the test_bit()/test_and_set_bit() etc. The lock specifically protects the successive testing of agg_session_valid and tid_rx_manage_offl. So I could just isolate the agg_session_valid condition as major condition then take the spinlock then test&set agg_session_valid and tid_rx_manage_offl as a minor condition, so the spinlock would be conditional. But see below. > > I really don't think it's _right_ to add an unconditional lock here > though, if it can at all be avoided. > > johannes After extensive testing, adding this lock was effective but only removed half of the DELBA frames wrongly sent. I suspect there is another race condition somewhere else. Thinking about it, here the DELBA is sent because the ath10k driver passes a BAREQ before the signaling the establishment of the BA session. either the firmware sent the two events in wrong order, or there was actually an inversion in the frames sent by the host (in my testing, Wireshark shows the first case only). In the first case, we should not send a DELBA at all, because actually, the session is correctly set up. The second case should be detected and acted upon by ath10k itself since it was responsible for acking the ADDBA; hence mac80211 should not send a -duplicate- DELBA either. So, I completely removed this call to ieee80211_send_delba() when working with ath10k, and my RX BA sessions do not break anymore. (I added my own hardware flag IEEE80211_HW_RX_AMPDU_IN_HW similar to TX) With my hardware it works, but I have no idea how to identify chipsets/drivers/firmwares that will handle this correctly. So I cannot submit a generic patch. Please close this request, I will stay with my proprietary fix. Hopefully someone with more ath10k/firmware knowledge will find a complete way to fix. Thanks for the review, Jean-Pierre
Index: bp/net/mac80211/rx.c =================================================================== --- bp.orig/net/mac80211/rx.c +++ bp/net/mac80211/rx.c @@ -3085,11 +3085,18 @@ ieee80211_rx_h_ctrl(struct ieee80211_rx_ tid = le16_to_cpu(bar_data.control) >> 12; + spin_lock_bh(&rx->sta->ampdu_mlme.rx_offl_lock); if (!test_bit(tid, rx->sta->ampdu_mlme.agg_session_valid) && - !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg)) + /* back_req is allowed if the fw just received addba */ + !test_bit(tid, rx->sta->ampdu_mlme.tid_rx_manage_offl) && + !test_and_set_bit(tid, rx->sta->ampdu_mlme.unexpected_agg)) { + spin_unlock_bh(&rx->sta->ampdu_mlme.rx_offl_lock); ieee80211_send_delba(rx->sdata, rx->sta->sta.addr, tid, WLAN_BACK_RECIPIENT, WLAN_REASON_QSTA_REQUIRE_SETUP); + } else { + spin_unlock_bh(&rx->sta->ampdu_mlme.rx_offl_lock); + } tid_agg_rx = rcu_dereference(rx->sta->ampdu_mlme.tid_rx[tid]); if (!tid_agg_rx) Index: bp/net/mac80211/ht.c =================================================================== --- bp.orig/net/mac80211/ht.c +++ bp/net/mac80211/ht.c @@ -360,12 +360,14 @@ void ieee80211_ba_session_work(struct wo sta, tid, WLAN_BACK_RECIPIENT, WLAN_REASON_UNSPECIFIED, true); + spin_lock_bh(&sta->ampdu_mlme.rx_offl_lock); if (!blocked && test_and_clear_bit(tid, sta->ampdu_mlme.tid_rx_manage_offl)) ___ieee80211_start_rx_ba_session(sta, 0, 0, 0, 1, tid, IEEE80211_MAX_AMPDU_BUF_HT, false, true, NULL); + spin_unlock_bh(&sta->ampdu_mlme.rx_offl_lock); if (test_and_clear_bit(tid + IEEE80211_NUM_TIDS, sta->ampdu_mlme.tid_rx_manage_offl)) Index: bp/net/mac80211/sta_info.c =================================================================== --- bp.orig/net/mac80211/sta_info.c +++ bp/net/mac80211/sta_info.c @@ -354,6 +354,7 @@ struct sta_info *sta_info_alloc(struct i spin_lock_init(&sta->lock); spin_lock_init(&sta->ps_lock); + spin_lock_init(&sta->ampdu_mlme.rx_offl_lock); INIT_WORK(&sta->drv_deliver_wk, sta_deliver_ps_frames); INIT_WORK(&sta->ampdu_mlme.work, ieee80211_ba_session_work); mutex_init(&sta->ampdu_mlme.mtx); Index: bp/net/mac80211/sta_info.h =================================================================== --- bp.orig/net/mac80211/sta_info.h +++ bp/net/mac80211/sta_info.h @@ -266,6 +266,7 @@ struct tid_ampdu_rx { * @mtx: mutex to protect all TX data (except non-NULL assignments * to tid_tx[idx], which are protected by the sta spinlock) * tid_start_tx is also protected by sta->lock. + * @rx_offl_lock: protects transfer from tid_rx_manage_offl to agg_session_valid * @tid_rx: aggregation info for Rx per TID -- RCU protected * @tid_rx_token: dialog tokens for valid aggregation sessions * @tid_rx_timer_expired: bitmap indicating on which TIDs the @@ -287,6 +288,7 @@ struct tid_ampdu_rx { struct sta_ampdu_mlme { struct mutex mtx; /* rx */ + spinlock_t rx_offl_lock; struct tid_ampdu_rx __rcu *tid_rx[IEEE80211_NUM_TIDS]; u8 tid_rx_token[IEEE80211_NUM_TIDS]; unsigned long tid_rx_timer_expired[BITS_TO_LONGS(IEEE80211_NUM_TIDS)];
When the mac80211 layer is used with ath10k, the following may happen: a) radio card firmware receives ADDBA-REQ from peer b) radio sends back ADDBA-RESP to peer and signals the ath10k driver c) ath10k calls back ieee80211_manage_rx_ba_offl() in mac80211 to signal rx ba offloading d) mac80211::agg-rx.c::ieee80211_manage_rx_ba_offl() d1) sets a flag: sta->ampdu_mlme.tid_rx_manage_offl d2) queues a call to ht.c::ieee80211_ba_session_work() e) ...scheduler runs... f) ht.c::ieee80211_ba_session_work() checks the flag, clears it and sets up the rx ba session. During (e), a fast peer may already have sent a BAREQ which is propagated to rx.c::ieee80211_rx_h_ctrl(). Since the session is not yet established, mac80211 sends back a DELBA to the peer, which can hang the BA session. The phenomenon can be observed between two QCA988X fw 10.2.4 radios, using a loop of associate/arping from client to AP/disconnect. After a few thousand loops, arping does not get a response and a sniffer detects a DELBA action frame from the client, following an ADDBA. Fix: 1) check the offload flag in addition to the check for a valid aggregation session 2) protect the paired checks of (offload flag, valid aggregation session) with a spinlock against interference from ieee80211_ba_session_work(). Note 1: there is another dubious DELBA generation in ieee80211_rx_reorder_ampdu(), where the same kind of fix should fit, but I did not fix it since I knew no easy way to test. Note 2: this fix was tested against a wireless backport from 5.4-rc8. Signed-off-by: Jean-Pierre Tosoni <jp.tosoni@acksys.fr> --- V2: remove debugging code leftovers, sorry for that V3: use spin_lock_bh instead of a mutex V4: spinlock must protect ___ieee80211_start_rx_ba_session instead of ___ieee80211_stop_rx_ba_session -- quilt 0.63