diff mbox series

[v2,2/2] wifi: ath9k: Reset chip on potential deaf state

Message ID 20241106090439.3487958-2-ih@simonwunderlich.de (mailing list archive)
State Superseded
Delegated to: Toke Høiland-Jørgensen
Headers show
Series [v2,1/2] wifi: ath9k: work around AR_CFG 0xdeadbeef chip hang | expand

Commit Message

Issam Hamdi Nov. 6, 2024, 9:04 a.m. UTC
From: Simon Wunderlich <simon.wunderlich@open-mesh.com>

The chip is switching seemingly random into a state which can be described
as "deaf". No or nearly no interrupts are generated anymore for incoming
packets. Existing links either break down after a while and new links will
not be established.

The driver doesn't know if there is no other device available or if it
ended up in an "deaf" state. Resetting the chip proactively avoids
permanent problems in case the chip really was in its "deaf" state but
maybe causes unnecessary resets in case it wasn't "deaf".

The circumstances leading to this "deafness" is still unclear,
but we see that some particular chips (especially 2-stream 11n
SoCs, but also others) can go 'deaf' when running AP or mesh (or both)
after some time. It's probably a hardware issue, and doing a channel
scan to trigger a chip reset (which one normally can't do on an AP interface)
recovers the hardware. This patch provides a workaround within the kernel.

The idea is that if the radio is idle anyway, a quick reset (which takes a few
tens of ms maximum) doesn't hurt much, and it helps to recover non-functional
APs or mesh points.

This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"

Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
---
v2: change the "Co-developed-by" to "Signed-off-by", remove the dependency
on CONFIG_ATH9K_DEBUGFS and add more information in the commit description
---
 drivers/net/wireless/ath/ath9k/ath9k.h |  3 ++
 drivers/net/wireless/ath/ath9k/debug.c |  1 +
 drivers/net/wireless/ath/ath9k/debug.h |  1 +
 drivers/net/wireless/ath/ath9k/link.c  | 48 +++++++++++++++++++++++++-
 4 files changed, 52 insertions(+), 1 deletion(-)

Comments

Sven Eckelmann Nov. 6, 2024, 10:05 a.m. UTC | #1
On Wednesday, 6 November 2024 10:04:39 CET Issam Hamdi wrote:
[...]
> This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
> and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"

Am I the only person which finds this style of adding information about "Co-
authors" weird?

[...]
> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
> Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
> Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
> ---
> v2: change the "Co-developed-by" to "Signed-off-by", remove the dependency

I think Kalle meant that "Co-developed-by" should be followed by a 
"Signed-off-by" - not that "Co-developed-by" should be removed.

I was not part of the delivery path for this version of the patch. But
current Signed-off-by seem to suggest this.

> on CONFIG_ATH9K_DEBUGFS and add more information in the commit description

And please don't reply to the old thread when sending a new patchset - this 
becomes really unreadable after a while. You can simply use the method which 
b4 uses and just reference the old thread in your mail. Something like:

Changes in v2:
- change the "Co-developed-by" to "Signed-off-by"
- remove the dependency on CONFIG_ATH9K_DEBUGFS
- add more information in the commit description
- Link to v1: https://lore.kernel.org/r/20241104171627.3789199-1-ih@simonwunderlich.de

[...]
> +static bool ath_hw_hang_deaf(struct ath_softc *sc)
> +{
> +#ifdef CONFIG_ATH9K_TX99
> +	return false;
> +#else
> +	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
> +	u32 interrupts, interrupt_per_s;
> +	unsigned int interval;
> +
> +	/* get historic data */
> +	interval = jiffies_to_msecs(jiffies - sc->last_check_time);
> +	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
> +		interrupts = sc->debug.stats.istats.rxlp;
> +	else
> +		interrupts = sc->debug.stats.istats.rxok;

You can't simply access sc->debug.stats.istats. sc->debug is only available 
when building with CONFIG_ATH9K_DEBUGFS. See ath9k.c

struct ath_softc {
[...]
#ifdef CONFIG_ATH9K_DEBUGFS
	struct ath9k_debug debug;
#endif
[...]
}

> +       /* sanity check, should be 4 seconds */
> +       if (interval > 10000 || interval < 1000)

Here you have hardcoded values but the actual interval is hidden behind 
ATH_HANG_WORK_INTERVAL. Two things which now are rather disconnected and might 
cause problems in the future (when somebody fiddles around with 
ATH_HANG_WORK_INTERVAL).

Overall, the proposal from Toke seems to be a lot better integrated in the HW 
check style which was introduced by Felix in the beginning of 2017 [1].

At the same time there was a proposal by Felix [2] - which diverged too much 
from our original patch (and as a result caused too many resets) [3]. I would 
therefore propose to check Toke's version and test handles the problem 
correctly.

Kind regards,
	Sven

[1] https://github.com/openwrt/openwrt/commit/b94177e10fc72f9309eae7459c3570e5c080e960
[2] https://patchwork.kernel.org/project/linux-wireless/patch/20170125163654.66431-3-nbd@nbd.name/
[3] https://lore.kernel.org/all/2081606.z26xgMiW1A@prime/
Toke Høiland-Jørgensen Nov. 6, 2024, 12:41 p.m. UTC | #2
Sven Eckelmann <se@simonwunderlich.de> writes:

> Overall, the proposal from Toke seems to be a lot better integrated in the HW 
> check style which was introduced by Felix in the beginning of 2017 [1].
>
> At the same time there was a proposal by Felix [2] - which diverged too much 
> from our original patch (and as a result caused too many resets) [3]. I would 
> therefore propose to check Toke's version and test handles the problem 
> correctly.

Yes, agreed. I did actually write up a commit message for that, let me
send it as a proper patch...

-Toke
Kalle Valo Nov. 7, 2024, 8:03 a.m. UTC | #3
Sven Eckelmann <se@simonwunderlich.de> writes:

> On Wednesday, 6 November 2024 10:04:39 CET Issam Hamdi wrote:
> [...]
>> This patch originally developed by "Simon Wunderlich <simon.wunderlich@open-mesh.com>"
>> and "Sven Eckelmann <sven.eckelmann@open-mesh.com>"
>
> Am I the only person which finds this style of adding information about "Co-
> authors" weird?

No, you are not. I also find it weird.

> [...]
>> Signed-off-by: Simon Wunderlich <sw@simonwunderlich.de>
>> Signed-off-by: Sven Eckelmann <se@simonwunderlich.de>
>> Signed-off-by: Issam Hamdi <ih@simonwunderlich.de>
>> ---
>> v2: change the "Co-developed-by" to "Signed-off-by", remove the dependency
>
> I think Kalle meant that "Co-developed-by" should be followed by a 
> "Signed-off-by" - not that "Co-developed-by" should be removed.

Correct, to my understanding having both c-d-b and s-o-b is the
preferred format.

>> on CONFIG_ATH9K_DEBUGFS and add more information in the commit description
>
> And please don't reply to the old thread when sending a new patchset - this 
> becomes really unreadable after a while. You can simply use the method which 
> b4 uses and just reference the old thread in your mail. Something like:
>
> Changes in v2:
> - change the "Co-developed-by" to "Signed-off-by"
> - remove the dependency on CONFIG_ATH9K_DEBUGFS
> - add more information in the commit description
> - Link to v1: https://lore.kernel.org/r/20241104171627.3789199-1-ih@simonwunderlich.de

Yes, this style is very much recommended. Having links to older versions
helps reviewers.
diff mbox series

Patch

diff --git a/drivers/net/wireless/ath/ath9k/ath9k.h b/drivers/net/wireless/ath/ath9k/ath9k.h
index c1ce081445a9..2b98c69fa37f 100644
--- a/drivers/net/wireless/ath/ath9k/ath9k.h
+++ b/drivers/net/wireless/ath/ath9k/ath9k.h
@@ -1026,6 +1026,9 @@  struct ath_softc {
 	short nbcnvifs;
 	unsigned long ps_usecount;
 
+	unsigned long last_check_time;
+	u32 last_check_interrupts;
+
 	struct ath_rx rx;
 	struct ath_tx tx;
 	struct ath_beacon beacon;
diff --git a/drivers/net/wireless/ath/ath9k/debug.c b/drivers/net/wireless/ath/ath9k/debug.c
index 6b2469a01f17..4128cf691166 100644
--- a/drivers/net/wireless/ath/ath9k/debug.c
+++ b/drivers/net/wireless/ath/ath9k/debug.c
@@ -751,6 +751,7 @@  static int read_file_reset(struct seq_file *file, void *data)
 		[RESET_TX_DMA_ERROR] = "Tx DMA stop error",
 		[RESET_RX_DMA_ERROR] = "Rx DMA stop error",
 		[RESET_TYPE_DEADBEEF] = "deadbeef hang",
+		[RESET_TYPE_DEAF] = "deaf hang",
 	};
 	int i;
 
diff --git a/drivers/net/wireless/ath/ath9k/debug.h b/drivers/net/wireless/ath/ath9k/debug.h
index 6ebb6053a8c1..76e27860455c 100644
--- a/drivers/net/wireless/ath/ath9k/debug.h
+++ b/drivers/net/wireless/ath/ath9k/debug.h
@@ -54,6 +54,7 @@  enum ath_reset_type {
 	RESET_TX_DMA_ERROR,
 	RESET_RX_DMA_ERROR,
 	RESET_TYPE_DEADBEEF,
+	RESET_TYPE_DEAF,
 	__RESET_TYPE_MAX
 };
 
diff --git a/drivers/net/wireless/ath/ath9k/link.c b/drivers/net/wireless/ath/ath9k/link.c
index 37438960c278..2700598c6336 100644
--- a/drivers/net/wireless/ath/ath9k/link.c
+++ b/drivers/net/wireless/ath/ath9k/link.c
@@ -162,13 +162,59 @@  static bool ath_hw_hang_deadbeef(struct ath_softc *sc)
 	return true;
 }
 
+static bool ath_hw_hang_deaf(struct ath_softc *sc)
+{
+#ifdef CONFIG_ATH9K_TX99
+	return false;
+#else
+	struct ath_common *common = ath9k_hw_common(sc->sc_ah);
+	u32 interrupts, interrupt_per_s;
+	unsigned int interval;
+
+	/* get historic data */
+	interval = jiffies_to_msecs(jiffies - sc->last_check_time);
+	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
+		interrupts = sc->debug.stats.istats.rxlp;
+	else
+		interrupts = sc->debug.stats.istats.rxok;
+
+	interrupts -= sc->last_check_interrupts;
+
+	/* save current data */
+	sc->last_check_time = jiffies;
+	if (sc->sc_ah->caps.hw_caps & ATH9K_HW_CAP_EDMA)
+		sc->last_check_interrupts = sc->debug.stats.istats.rxlp;
+	else
+		sc->last_check_interrupts = sc->debug.stats.istats.rxok;
+
+	/* sanity check, should be 4 seconds */
+	if (interval > 10000 || interval < 1000)
+		return false;
+
+	/* should be at least one interrupt per second */
+	interrupt_per_s = interrupts / (interval / 1000);
+	if (interrupt_per_s >= 1)
+		return false;
+
+	ath_dbg(common, RESET,
+		"RX deaf hang is detected. Schedule chip reset\n");
+	ath9k_queue_reset(sc, RESET_TYPE_DEAF);
+
+	return true;
+#endif
+}
+
 void ath_hw_hang_work(struct work_struct *work)
 {
 	struct ath_softc *sc = container_of(work, struct ath_softc,
 					    hw_hang_work.work);
 
-	ath_hw_hang_deadbeef(sc);
+	if (ath_hw_hang_deadbeef(sc))
+		goto requeue_worker;
 
+	ath_hw_hang_deaf(sc);
+
+requeue_worker:
 	ieee80211_queue_delayed_work(sc->hw, &sc->hw_hang_work,
 				     msecs_to_jiffies(ATH_HANG_WORK_INTERVAL));
 }