Message ID | 1484234070-7431-1-git-send-email-michal.kazior@tieto.com (mailing list archive) |
---|---|
State | Not Applicable |
Delegated to: | Kalle Valo |
Headers | show |
On Thu, 2017-01-12 at 16:14 +0100, Michal Kazior wrote: > Station pointers are RCU protected so driver must > be extra careful if it tries to store them > internally for later use outside of the RCU > section it obtained it in. > > It was possible for station teardown to race with > some htt events. The possible outcome could be a > use-after-free and a crash. > > Only peer-flow-control capable firmware was > affected (so hardware-wise qca99x0 and qca4019). > > This could be done in sta_state() itself via > explicit synchronize_net() call but there's > already a convenient sta_pre_rcu_remove() op that > can be hooked up to avoid extra rcu stall. I don't think this makes sense. You're not using RCU-protected pointers to the stations yourself, all accesses to them are locked under the &ar->data_lock. As a consequence, you can't have any need for waiting for a grace period. Since you also remove the pointer (under lock) when the station gets removed, I don't think RCU can be the problem? johannes
On 12 January 2017 at 16:46, Johannes Berg <johannes@sipsolutions.net> wrote: > On Thu, 2017-01-12 at 16:14 +0100, Michal Kazior wrote: >> Station pointers are RCU protected so driver must >> be extra careful if it tries to store them >> internally for later use outside of the RCU >> section it obtained it in. >> >> It was possible for station teardown to race with >> some htt events. The possible outcome could be a >> use-after-free and a crash. >> >> Only peer-flow-control capable firmware was >> affected (so hardware-wise qca99x0 and qca4019). >> >> This could be done in sta_state() itself via >> explicit synchronize_net() call but there's >> already a convenient sta_pre_rcu_remove() op that >> can be hooked up to avoid extra rcu stall. > > I don't think this makes sense. You're not using RCU-protected pointers > to the stations yourself, all accesses to them are locked under the > &ar->data_lock. As a consequence, you can't have any need for waiting > for a grace period. Since you also remove the pointer (under lock) when > the station gets removed, I don't think RCU can be the problem? Unless you then continue to use that sta pointer after you release data_lock. Consider this: > CPU0 CPU1 > 1 synchronize_net() > 2 drv_sta_state() > 3 htt_fetch_ind(pid,tid) called > 4 rcu_read_lock() > 5 get(data_lock) > 6 txq=peers[pid]->sta->txq[tid] > 7 put(data_lock) > 8 get(data_lock) > 9 peer->sta=0 > 10 put(data_lock) > 11 kfree(sta) > 12 ieee80211_tx_dequeue(txq) Even though there's no code like (9) per se you can think of it as anything that tries to "remove" the peer--sta association (ath10k_peer is removed implicitly via wmi peer delete command and waiting for htt event completion). Holding data_lock for the entire duration of handling fetch indication isn't really good for performance so it's better to fix RCU handling. Michał
> Unless you then continue to use that sta pointer after you release > data_lock. Ouch, ok. That's rather strangely hidden though. > Consider this: > > > CPU0 CPU1 > > 1 synchronize_net() > > 2 drv_sta_state() > > 3 htt_fetch_ind(pid,tid) called > > 4 rcu_read_lock() > > 5 get(data_lock) > > 6 txq=peers[pid]->sta->txq[tid] > > 7 put(data_lock) > > 8 get(data_lock) > > 9 peer->sta=0 > > 10 put(data_lock) > > 11 kfree(sta) > > 12 ieee80211_tx_dequeue(txq) > > Even though there's no code like (9) per se you can think of it as > anything that tries to "remove" the peer--sta association > (ath10k_peer > is removed implicitly via wmi peer delete command and waiting for htt > event completion). > > Holding data_lock for the entire duration of handling fetch > indication isn't really good for performance so it's better to fix > RCU handling. Yeah, I see it now - also the comment where this happens. You probably should mark some things __rcu though and actually use RCU primitives, so the code is actually understandable :) johannes
On 13 January 2017 at 08:24, Johannes Berg <johannes@sipsolutions.net> wrote: > >> Unless you then continue to use that sta pointer after you release >> data_lock. > > Ouch, ok. That's rather strangely hidden though. > >> Consider this: >> >> > CPU0 CPU1 >> > 1 synchronize_net() >> > 2 drv_sta_state() >> > 3 htt_fetch_ind(pid,tid) called >> > 4 rcu_read_lock() >> > 5 get(data_lock) >> > 6 txq=peers[pid]->sta->txq[tid] >> > 7 put(data_lock) >> > 8 get(data_lock) >> > 9 peer->sta=0 >> > 10 put(data_lock) >> > 11 kfree(sta) >> > 12 ieee80211_tx_dequeue(txq) >> >> Even though there's no code like (9) per se you can think of it as >> anything that tries to "remove" the peer--sta association >> (ath10k_peer >> is removed implicitly via wmi peer delete command and waiting for htt >> event completion). >> >> Holding data_lock for the entire duration of handling fetch >> indication isn't really good for performance so it's better to fix >> RCU handling. > > Yeah, I see it now - also the comment where this happens. You probably > should mark some things __rcu though and actually use RCU primitives, > so the code is actually understandable :) Good point. I'll do that in another patch. Michał
Michal Kazior <michal.kazior@tieto.com> wrote: > Station pointers are RCU protected so driver must > be extra careful if it tries to store them > internally for later use outside of the RCU > section it obtained it in. > > It was possible for station teardown to race with > some htt events. The possible outcome could be a > use-after-free and a crash. > > Only peer-flow-control capable firmware was > affected (so hardware-wise qca99x0 and qca4019). > > This could be done in sta_state() itself via > explicit synchronize_net() call but there's > already a convenient sta_pre_rcu_remove() op that > can be hooked up to avoid extra rcu stall. > > The peer->sta pointer itself can't be set to > NULL/ERR_PTR because it is later used in > sta_state() for extra sanity checks. > > Signed-off-by: Michal Kazior <michal.kazior@tieto.com> Patch applied to ath-next branch of ath.git, thanks. 0a744d927406 ath10k: prevent sta pointer rcu violation
diff --git a/drivers/net/wireless/ath/ath10k/core.h b/drivers/net/wireless/ath/ath10k/core.h index c7664d6569fa..1ab589296dff 100644 --- a/drivers/net/wireless/ath/ath10k/core.h +++ b/drivers/net/wireless/ath/ath10k/core.h @@ -314,6 +314,7 @@ struct ath10k_peer { struct ieee80211_vif *vif; struct ieee80211_sta *sta; + bool removed; int vdev_id; u8 addr[ETH_ALEN]; DECLARE_BITMAP(peer_ids, ATH10K_MAX_NUM_PEER_IDS); diff --git a/drivers/net/wireless/ath/ath10k/mac.c b/drivers/net/wireless/ath/ath10k/mac.c index d1b7edba5e49..aa91f55b35a4 100644 --- a/drivers/net/wireless/ath/ath10k/mac.c +++ b/drivers/net/wireless/ath/ath10k/mac.c @@ -3774,6 +3774,9 @@ struct ieee80211_txq *ath10k_mac_txq_lookup(struct ath10k *ar, if (!peer) return NULL; + if (peer->removed) + return NULL; + if (peer->sta) return peer->sta->txq[tid]; else if (peer->vif) @@ -7476,6 +7479,20 @@ ath10k_mac_op_switch_vif_chanctx(struct ieee80211_hw *hw, return 0; } +static void ath10k_mac_op_sta_pre_rcu_remove(struct ieee80211_hw *hw, + struct ieee80211_vif *vif, + struct ieee80211_sta *sta) +{ + struct ath10k *ar; + struct ath10k_peer *peer; + + ar = hw->priv; + + list_for_each_entry(peer, &ar->peers, list) + if (peer->sta == sta) + peer->removed = true; +} + static const struct ieee80211_ops ath10k_ops = { .tx = ath10k_mac_op_tx, .wake_tx_queue = ath10k_mac_op_wake_tx_queue, @@ -7516,6 +7533,7 @@ static const struct ieee80211_ops ath10k_ops = { .assign_vif_chanctx = ath10k_mac_op_assign_vif_chanctx, .unassign_vif_chanctx = ath10k_mac_op_unassign_vif_chanctx, .switch_vif_chanctx = ath10k_mac_op_switch_vif_chanctx, + .sta_pre_rcu_remove = ath10k_mac_op_sta_pre_rcu_remove, CFG80211_TESTMODE_CMD(ath10k_tm_cmd)
Station pointers are RCU protected so driver must be extra careful if it tries to store them internally for later use outside of the RCU section it obtained it in. It was possible for station teardown to race with some htt events. The possible outcome could be a use-after-free and a crash. Only peer-flow-control capable firmware was affected (so hardware-wise qca99x0 and qca4019). This could be done in sta_state() itself via explicit synchronize_net() call but there's already a convenient sta_pre_rcu_remove() op that can be hooked up to avoid extra rcu stall. The peer->sta pointer itself can't be set to NULL/ERR_PTR because it is later used in sta_state() for extra sanity checks. Signed-off-by: Michal Kazior <michal.kazior@tieto.com> --- drivers/net/wireless/ath/ath10k/core.h | 1 + drivers/net/wireless/ath/ath10k/mac.c | 18 ++++++++++++++++++ 2 files changed, 19 insertions(+)