diff mbox series

[15/16] mac80211: agg-tx: don't schedule_and_wake_txq() under sta->lock

Message ID iwlwifi.20211129152938.d5fceeb7e166.I555fef8e67d93fff3d9a304886c4a9f8b322e591@changeid (mailing list archive)
State Superseded
Delegated to: Johannes Berg
Headers show
Series cfg80211/mac80211 patches from our internal tree 2021-11-29 | expand

Commit Message

Luca Coelho Nov. 29, 2021, 1:32 p.m. UTC
From: Johannes Berg <johannes.berg@intel.com>

When we call ieee80211_agg_start_txq(), that will in turn call
schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
this is done under sta->lock, which leads to certain circular
lock dependencies, as reported by Chris Murphy:
https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com

In general, ieee80211_agg_start_txq() is usually not called
with sta->lock held, only in this one place. But it's always
called with sta->ampdu_mlme.mtx held, and that's therefore
clearly sufficient.

Change ieee80211_stop_tx_ba_cb() to also call it without the
sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
(which is only called in this one place).

This breaks the locking chain and makes it less likely that
we'll have similar locking chain problems in the future.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
---
 net/mac80211/agg-tx.c | 10 ++++++++--
 1 file changed, 8 insertions(+), 2 deletions(-)

Comments

Toke Høiland-Jørgensen Nov. 29, 2021, 1:54 p.m. UTC | #1
Luca Coelho <luca@coelho.fi> writes:

> From: Johannes Berg <johannes.berg@intel.com>
>
> When we call ieee80211_agg_start_txq(), that will in turn call
> schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
> this is done under sta->lock, which leads to certain circular
> lock dependencies, as reported by Chris Murphy:
> https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
>
> In general, ieee80211_agg_start_txq() is usually not called
> with sta->lock held, only in this one place. But it's always
> called with sta->ampdu_mlme.mtx held, and that's therefore
> clearly sufficient.
>
> Change ieee80211_stop_tx_ba_cb() to also call it without the
> sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
> (which is only called in this one place).
>
> This breaks the locking chain and makes it less likely that
> we'll have similar locking chain problems in the future.
>
> Reported-by: Chris Murphy <lists@colorremedies.com>
> Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> Signed-off-by: Luca Coelho <luciano.coelho@intel.com>

Does this need a fixes: tag?

-Toke
Luca Coelho Nov. 30, 2021, 11:12 a.m. UTC | #2
On Mon, 2021-11-29 at 14:54 +0100, Toke Høiland-Jørgensen wrote:
> Luca Coelho <luca@coelho.fi> writes:
> 
> > From: Johannes Berg <johannes.berg@intel.com>
> > 
> > When we call ieee80211_agg_start_txq(), that will in turn call
> > schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
> > this is done under sta->lock, which leads to certain circular
> > lock dependencies, as reported by Chris Murphy:
> > https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
> > 
> > In general, ieee80211_agg_start_txq() is usually not called
> > with sta->lock held, only in this one place. But it's always
> > called with sta->ampdu_mlme.mtx held, and that's therefore
> > clearly sufficient.
> > 
> > Change ieee80211_stop_tx_ba_cb() to also call it without the
> > sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
> > (which is only called in this one place).
> > 
> > This breaks the locking chain and makes it less likely that
> > we'll have similar locking chain problems in the future.
> > 
> > Reported-by: Chris Murphy <lists@colorremedies.com>
> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
> 
> Does this need a fixes: tag?

Hi Toke,

Neither Johannes nor Chris pointed to any specific patch that this
commit is fixing.  If you know the exact commit that broke this, I can
add the tag and send v2.

Thanks!

--
Cheers,
Luca.
Toke Høiland-Jørgensen Nov. 30, 2021, 11:32 a.m. UTC | #3
Luca Coelho <luca@coelho.fi> writes:

> On Mon, 2021-11-29 at 14:54 +0100, Toke Høiland-Jørgensen wrote:
>> Luca Coelho <luca@coelho.fi> writes:
>> 
>> > From: Johannes Berg <johannes.berg@intel.com>
>> > 
>> > When we call ieee80211_agg_start_txq(), that will in turn call
>> > schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
>> > this is done under sta->lock, which leads to certain circular
>> > lock dependencies, as reported by Chris Murphy:
>> > https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
>> > 
>> > In general, ieee80211_agg_start_txq() is usually not called
>> > with sta->lock held, only in this one place. But it's always
>> > called with sta->ampdu_mlme.mtx held, and that's therefore
>> > clearly sufficient.
>> > 
>> > Change ieee80211_stop_tx_ba_cb() to also call it without the
>> > sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
>> > (which is only called in this one place).
>> > 
>> > This breaks the locking chain and makes it less likely that
>> > we'll have similar locking chain problems in the future.
>> > 
>> > Reported-by: Chris Murphy <lists@colorremedies.com>
>> > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>> > Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>> 
>> Does this need a fixes: tag?
>
> Hi Toke,
>
> Neither Johannes nor Chris pointed to any specific patch that this
> commit is fixing.  If you know the exact commit that broke this, I can
> add the tag and send v2.

Well, it looks like the code you're changing comes all the way back from:
ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")

Maybe Johannes can comment on whether it's appropriate to include this,
or if the code changed too much in the meantime...

-Toke
Johannes Berg Nov. 30, 2021, 11:52 a.m. UTC | #4
On Tue, 2021-11-30 at 12:32 +0100, Toke Høiland-Jørgensen wrote:
> Luca Coelho <luca@coelho.fi> writes:
> 
> > On Mon, 2021-11-29 at 14:54 +0100, Toke Høiland-Jørgensen wrote:
> > > Luca Coelho <luca@coelho.fi> writes:
> > > 
> > > > From: Johannes Berg <johannes.berg@intel.com>
> > > > 
> > > > When we call ieee80211_agg_start_txq(), that will in turn call
> > > > schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
> > > > this is done under sta->lock, which leads to certain circular
> > > > lock dependencies, as reported by Chris Murphy:
> > > > https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
> > > > 
> > > > In general, ieee80211_agg_start_txq() is usually not called
> > > > with sta->lock held, only in this one place. But it's always
> > > > called with sta->ampdu_mlme.mtx held, and that's therefore
> > > > clearly sufficient.
> > > > 
> > > > Change ieee80211_stop_tx_ba_cb() to also call it without the
> > > > sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
> > > > (which is only called in this one place).
> > > > 
> > > > This breaks the locking chain and makes it less likely that
> > > > we'll have similar locking chain problems in the future.
> > > > 
> > > > Reported-by: Chris Murphy <lists@colorremedies.com>
> > > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
> > > 
> > > Does this need a fixes: tag?
> > 
> > Hi Toke,
> > 
> > Neither Johannes nor Chris pointed to any specific patch that this
> > commit is fixing.  If you know the exact commit that broke this, I can
> > add the tag and send v2.
> 
> Well, it looks like the code you're changing comes all the way back from:
> ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")
> 
> Maybe Johannes can comment on whether it's appropriate to include this,
> or if the code changed too much in the meantime...
> 

I think that probably makes sense, and I guess I can include that tag
when I apply it.

I suspect the reason I didn't do it internally (we have a different tag
though, so that's no excuse) is that there we didn't care until iwlwifi
actually gained TXQ support.

johannes
Toke Høiland-Jørgensen Nov. 30, 2021, 11:56 a.m. UTC | #5
Johannes Berg <johannes@sipsolutions.net> writes:

> On Tue, 2021-11-30 at 12:32 +0100, Toke Høiland-Jørgensen wrote:
>> Luca Coelho <luca@coelho.fi> writes:
>> 
>> > On Mon, 2021-11-29 at 14:54 +0100, Toke Høiland-Jørgensen wrote:
>> > > Luca Coelho <luca@coelho.fi> writes:
>> > > 
>> > > > From: Johannes Berg <johannes.berg@intel.com>
>> > > > 
>> > > > When we call ieee80211_agg_start_txq(), that will in turn call
>> > > > schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
>> > > > this is done under sta->lock, which leads to certain circular
>> > > > lock dependencies, as reported by Chris Murphy:
>> > > > https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
>> > > > 
>> > > > In general, ieee80211_agg_start_txq() is usually not called
>> > > > with sta->lock held, only in this one place. But it's always
>> > > > called with sta->ampdu_mlme.mtx held, and that's therefore
>> > > > clearly sufficient.
>> > > > 
>> > > > Change ieee80211_stop_tx_ba_cb() to also call it without the
>> > > > sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
>> > > > (which is only called in this one place).
>> > > > 
>> > > > This breaks the locking chain and makes it less likely that
>> > > > we'll have similar locking chain problems in the future.
>> > > > 
>> > > > Reported-by: Chris Murphy <lists@colorremedies.com>
>> > > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
>> > > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
>> > > 
>> > > Does this need a fixes: tag?
>> > 
>> > Hi Toke,
>> > 
>> > Neither Johannes nor Chris pointed to any specific patch that this
>> > commit is fixing.  If you know the exact commit that broke this, I can
>> > add the tag and send v2.
>> 
>> Well, it looks like the code you're changing comes all the way back from:
>> ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")
>> 
>> Maybe Johannes can comment on whether it's appropriate to include this,
>> or if the code changed too much in the meantime...
>> 
>
> I think that probably makes sense, and I guess I can include that tag
> when I apply it.
>
> I suspect the reason I didn't do it internally (we have a different tag
> though, so that's no excuse) is that there we didn't care until iwlwifi
> actually gained TXQ support.

Right, makes sense - thanks! :)

-Toke
Luca Coelho Nov. 30, 2021, 11:57 a.m. UTC | #6
On Tue, 2021-11-30 at 12:52 +0100, Johannes Berg wrote:
> On Tue, 2021-11-30 at 12:32 +0100, Toke Høiland-Jørgensen wrote:
> > Luca Coelho <luca@coelho.fi> writes:
> > 
> > > On Mon, 2021-11-29 at 14:54 +0100, Toke Høiland-Jørgensen wrote:
> > > > Luca Coelho <luca@coelho.fi> writes:
> > > > 
> > > > > From: Johannes Berg <johannes.berg@intel.com>
> > > > > 
> > > > > When we call ieee80211_agg_start_txq(), that will in turn call
> > > > > schedule_and_wake_txq(). Called from ieee80211_stop_tx_ba_cb()
> > > > > this is done under sta->lock, which leads to certain circular
> > > > > lock dependencies, as reported by Chris Murphy:
> > > > > https://lore.kernel.org/r/CAJCQCtSXJ5qA4bqSPY=oLRMbv-irihVvP7A2uGutEbXQVkoNaw@mail.gmail.com
> > > > > 
> > > > > In general, ieee80211_agg_start_txq() is usually not called
> > > > > with sta->lock held, only in this one place. But it's always
> > > > > called with sta->ampdu_mlme.mtx held, and that's therefore
> > > > > clearly sufficient.
> > > > > 
> > > > > Change ieee80211_stop_tx_ba_cb() to also call it without the
> > > > > sta->lock held, by factoring it out of ieee80211_remove_tid_tx()
> > > > > (which is only called in this one place).
> > > > > 
> > > > > This breaks the locking chain and makes it less likely that
> > > > > we'll have similar locking chain problems in the future.
> > > > > 
> > > > > Reported-by: Chris Murphy <lists@colorremedies.com>
> > > > > Signed-off-by: Johannes Berg <johannes.berg@intel.com>
> > > > > Signed-off-by: Luca Coelho <luciano.coelho@intel.com>
> > > > 
> > > > Does this need a fixes: tag?
> > > 
> > > Hi Toke,
> > > 
> > > Neither Johannes nor Chris pointed to any specific patch that this
> > > commit is fixing.  If you know the exact commit that broke this, I can
> > > add the tag and send v2.
> > 
> > Well, it looks like the code you're changing comes all the way back from:
> > ba8c3d6f16a1 ("mac80211: add an intermediate software queue implementation")
> > 
> > Maybe Johannes can comment on whether it's appropriate to include this,
> > or if the code changed too much in the meantime...
> > 
> 
> I think that probably makes sense, and I guess I can include that tag
> when I apply it.
> 
> I suspect the reason I didn't do it internally (we have a different tag
> though, so that's no excuse) is that there we didn't care until iwlwifi
> actually gained TXQ support.

Thanks, guys!

Johannes, do you want me to send v2 or do you want to add the tag
yourself? There is one more patch that is broken (ugh, sorry) that I
need to fix anyway...

--
Cheers,
Luca.
Johannes Berg Nov. 30, 2021, 12:08 p.m. UTC | #7
On Tue, 2021-11-30 at 13:57 +0200, Luca Coelho wrote:
> > > > 
> 
> Johannes, do you want me to send v2 or do you want to add the tag
> yourself? There is one more patch that is broken (ugh, sorry) that I
> need to fix anyway...
> 

Well if you resend I don't have to remember, so that wouldn't hurt
either ;-)

johannes
diff mbox series

Patch

diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 430a58587538..4dd56daed89b 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -9,7 +9,7 @@ 
  * Copyright 2007, Michael Wu <flamingice@sourmilk.net>
  * Copyright 2007-2010, Intel Corporation
  * Copyright(c) 2015-2017 Intel Deutschland GmbH
- * Copyright (C) 2018 - 2020 Intel Corporation
+ * Copyright (C) 2018 - 2021 Intel Corporation
  */
 
 #include <linux/ieee80211.h>
@@ -213,6 +213,8 @@  ieee80211_agg_start_txq(struct sta_info *sta, int tid, bool enable)
 	struct ieee80211_txq *txq = sta->sta.txq[tid];
 	struct txq_info *txqi;
 
+	lockdep_assert_held(&sta->ampdu_mlme.mtx);
+
 	if (!txq)
 		return;
 
@@ -290,7 +292,6 @@  static void ieee80211_remove_tid_tx(struct sta_info *sta, int tid)
 	ieee80211_assign_tid_tx(sta, tid, NULL);
 
 	ieee80211_agg_splice_finish(sta->sdata, tid);
-	ieee80211_agg_start_txq(sta, tid, false);
 
 	kfree_rcu(tid_tx, rcu_head);
 }
@@ -889,6 +890,7 @@  void ieee80211_stop_tx_ba_cb(struct sta_info *sta, int tid,
 {
 	struct ieee80211_sub_if_data *sdata = sta->sdata;
 	bool send_delba = false;
+	bool start_txq = false;
 
 	ht_dbg(sdata, "Stopping Tx BA session for %pM tid %d\n",
 	       sta->sta.addr, tid);
@@ -906,10 +908,14 @@  void ieee80211_stop_tx_ba_cb(struct sta_info *sta, int tid,
 		send_delba = true;
 
 	ieee80211_remove_tid_tx(sta, tid);
+	start_txq = true;
 
  unlock_sta:
 	spin_unlock_bh(&sta->lock);
 
+	if (start_txq)
+		ieee80211_agg_start_txq(sta, tid, false);
+
 	if (send_delba)
 		ieee80211_send_delba(sdata, sta->sta.addr, tid,
 			WLAN_BACK_INITIATOR, WLAN_REASON_QSTA_NOT_USE);