diff mbox

[RFC,v3] mac80211: lock rate control

Message ID 1426061656-17546-1-git-send-email-sven@narfation.org (mailing list archive)
State RFC
Headers show

Commit Message

Sven Eckelmann March 11, 2015, 8:14 a.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

Both minstrel (reported by Sven Eckelmann) and the iwlwifi rate
control aren't properly taking concurrency into account. It's
likely that the same is true for other rate control algorithms.

In the case of minstrel this manifests itself in crashes when an
update and other data access are run concurrently, for example
when the stations change bandwidth or similar. In iwlwifi, this
can cause firmware crashes.

Since fixing all rate control algorithms will be very difficult,
just provide locking for invocations. This protects the internal
data structures the algorithms maintain.

I've manipulated hostapd to test this, by having it change its
advertised bandwidth roughly ever 150ms. At the same time, I'm
running a flood ping between the client and the AP, which causes
this race of update vs. get_rate/status to easily happen on the
client. With this change, the system survives this test.

Reported-by: Sven Eckelmann <sven@open-mesh.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
---
This is a modified version of RFCv2 with an addition from
mid.gmane.org/8006741.C7YlhOg3U7@bentobox

I've just post this modification to show which was necessary to get it working
when using IBSS. I leave the actual PATCH submission to Johannes Berg

 net/mac80211/rate.c     |  8 +++++++-
 net/mac80211/rate.h     | 14 +++++++++++---
 net/mac80211/sta_info.c |  2 +-
 net/mac80211/sta_info.h |  1 +
 4 files changed, 20 insertions(+), 5 deletions(-)

Comments

Sven Eckelmann March 11, 2015, 8:22 a.m. UTC | #1
On Wednesday 11 March 2015 09:14:15 Sven Eckelmann wrote:
> This is a modified version of RFCv2 with an addition from
> mid.gmane.org/8006741.C7YlhOg3U7@bentobox

This was near but still a miss. The correct link is 
http://mid.gmane.org/2736671.Y6Lt54SxyK@bentobox

Kind regards,
	Sven
--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Johannes Berg March 12, 2015, 6:37 p.m. UTC | #2
On Wed, 2015-03-11 at 09:14 +0100, Sven Eckelmann wrote:

> Since fixing all rate control algorithms will be very difficult,
> just provide locking for invocations. This protects the internal
> data structures the algorithms maintain.

So ... we've been running with this and found that it causes a potential
deadlock.

The sta->lock is held in many places (particularly mesh, but in one case
also for delBA) while transmitting frames, so this causes a
lock->rate_ctrl_lock dependency.

At the same time, that lock is acquired for aggregation state machines,
i.e. the exported function calls ieee80211_stop_rx_ba_session(),
ieee80211_start_tx_ba_session() and ieee80211_stop_tx_ba_session().
These are often called from the rate control algorithm (minstrel,
rtlwifi, iwlwifi, ...) and thus cause a rate_ctrl_lock->lock dependency.

Combined, this clearly creates potential for classic ABBA deadlocks.

I'm not really sure yet how to fix this. One way would be to avoid all
TX under the sta->lock, which is a relatively simple patch for the
aggregation case since there's just one place
(http://p.sipsolutions.net/f951b8da4e78112a.txt) but mesh is far more
complex and has many places that send frames while holding sta->lock.

Another way would to avoid all such things would be to just make all the
internal mac80211 TX asynchronous by punting to the TX tasklet we have
anyway, but that seems overkill.

Another option might be to let mesh use a different spinlock, but then
we'd have to be careful not to cause a lock->mesh_lock dependency since
that would again lead to a lock->mesh_lock->rate_ctrl_lock dependency,
which is still buggy since the aggregation code paths cause the other
dependency.

Yet another option might be to make the three mentioned aggregation
functions asynchronous entirely, but that might cause interesting races
in the aggregation state machines again, and some drivers even rely on
the return value which could obviously then not be given.

If anyone has any good ideas ...

johannes

--
To unsubscribe from this list: send the line "unsubscribe linux-wireless" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/net/mac80211/rate.c b/net/mac80211/rate.c
index d53355b..de69adf 100644
--- a/net/mac80211/rate.c
+++ b/net/mac80211/rate.c
@@ -683,7 +683,13 @@  void rate_control_get_rate(struct ieee80211_sub_if_data *sdata,
 	if (sdata->local->hw.flags & IEEE80211_HW_HAS_RATE_CONTROL)
 		return;
 
-	ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+	if (ista) {
+		spin_lock_bh(&sta->rate_ctrl_lock);
+		ref->ops->get_rate(ref->priv, ista, priv_sta, txrc);
+		spin_unlock_bh(&sta->rate_ctrl_lock);
+	} else {
+		ref->ops->get_rate(ref->priv, NULL, NULL, txrc);
+	}
 
 	if (sdata->local->hw.flags & IEEE80211_HW_SUPPORTS_RC_TABLE)
 		return;
diff --git a/net/mac80211/rate.h b/net/mac80211/rate.h
index 38652f0..25c9be5 100644
--- a/net/mac80211/rate.h
+++ b/net/mac80211/rate.h
@@ -42,10 +42,12 @@  static inline void rate_control_tx_status(struct ieee80211_local *local,
 	if (!ref || !test_sta_flag(sta, WLAN_STA_RATE_CONTROL))
 		return;
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	if (ref->ops->tx_status)
 		ref->ops->tx_status(ref->priv, sband, ista, priv_sta, skb);
 	else
 		ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 }
 
 static inline void
@@ -64,7 +66,9 @@  rate_control_tx_status_noskb(struct ieee80211_local *local,
 	if (WARN_ON_ONCE(!ref->ops->tx_status_noskb))
 		return;
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	ref->ops->tx_status_noskb(ref->priv, sband, ista, priv_sta, info);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 }
 
 static inline void rate_control_rate_init(struct sta_info *sta)
@@ -91,8 +95,10 @@  static inline void rate_control_rate_init(struct sta_info *sta)
 
 	sband = local->hw.wiphy->bands[chanctx_conf->def.chan->band];
 
+	spin_lock_bh(&sta->rate_ctrl_lock);
 	ref->ops->rate_init(ref->priv, sband, &chanctx_conf->def, ista,
 			    priv_sta);
+	spin_unlock_bh(&sta->rate_ctrl_lock);
 	rcu_read_unlock();
 	set_sta_flag(sta, WLAN_STA_RATE_CONTROL);
 }
@@ -115,18 +121,20 @@  static inline void rate_control_rate_update(struct ieee80211_local *local,
 			return;
 		}
 
+		spin_lock_bh(&sta->rate_ctrl_lock);
 		ref->ops->rate_update(ref->priv, sband, &chanctx_conf->def,
 				      ista, priv_sta, changed);
+		spin_unlock_bh(&sta->rate_ctrl_lock);
 		rcu_read_unlock();
 	}
 	drv_sta_rc_update(local, sta->sdata, &sta->sta, changed);
 }
 
 static inline void *rate_control_alloc_sta(struct rate_control_ref *ref,
-					   struct ieee80211_sta *sta,
-					   gfp_t gfp)
+					   struct sta_info *sta, gfp_t gfp)
 {
-	return ref->ops->alloc_sta(ref->priv, sta, gfp);
+	spin_lock_init(&sta->rate_ctrl_lock);
+	return ref->ops->alloc_sta(ref->priv, &sta->sta, gfp);
 }
 
 static inline void rate_control_free_sta(struct sta_info *sta)
diff --git a/net/mac80211/sta_info.c b/net/mac80211/sta_info.c
index 00ca8dc..8e8f4c2 100644
--- a/net/mac80211/sta_info.c
+++ b/net/mac80211/sta_info.c
@@ -282,7 +282,7 @@  static int sta_prepare_rate_control(struct ieee80211_local *local,
 
 	sta->rate_ctrl = local->rate_ctrl;
 	sta->rate_ctrl_priv = rate_control_alloc_sta(sta->rate_ctrl,
-						     &sta->sta, gfp);
+						     sta, gfp);
 	if (!sta->rate_ctrl_priv)
 		return -ENOMEM;
 
diff --git a/net/mac80211/sta_info.h b/net/mac80211/sta_info.h
index 925e68f..a8f037f 100644
--- a/net/mac80211/sta_info.h
+++ b/net/mac80211/sta_info.h
@@ -368,6 +368,7 @@  struct sta_info {
 	u8 ptk_idx;
 	struct rate_control_ref *rate_ctrl;
 	void *rate_ctrl_priv;
+	spinlock_t rate_ctrl_lock;
 	spinlock_t lock;
 
 	struct work_struct drv_deliver_wk;