diff mbox series

brcmfmac: run firmware state watchdog on the host machine

Message ID 20190222170739.14266-1-zajec5@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Kalle Valo
Headers show
Series brcmfmac: run firmware state watchdog on the host machine | expand

Commit Message

Rafał Miłecki Feb. 22, 2019, 5:07 p.m. UTC
From: Rafał Miłecki <rafal@milecki.pl>

FullMAC firmware may happen to crash due to some potential bugs exposed
by e.g. a specific traffic or host-requested setup. It usually results
in various timeouts & running our of resources (e.g. ring slots).

Monitoring firmware state allows handling such a situation more
gracefully. At this point the watchdog:
1) Prints a clear error message about a firmware crash
2) Tries to dump a crash info data

That should be helpful for users & should allow providing a valuable
reports for the Broadcom developers. It obviously doesn't really fix
anything.

This watchdog is important for USB & SDIO devices which don't have
anything alike implemented yet. It's also there to complement PCIe with
its FWHALT signal which does not work in all situations.

It has been successfully tested with the recently released
brcmfmac4366c-pcie.bin.

Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=201853
Signed-off-by: Rafał Miłecki <rafal@milecki.pl>
---
 .../broadcom/brcm80211/brcmfmac/core.c        | 28 +++++++++++++++++++
 .../broadcom/brcm80211/brcmfmac/core.h        |  2 ++
 2 files changed, 30 insertions(+)

Comments

Arend van Spriel Feb. 22, 2019, 11:06 p.m. UTC | #1
On 2/22/2019 6:07 PM, Rafał Miłecki wrote:
> From: Rafał Miłecki <rafal@milecki.pl>
> 
> FullMAC firmware may happen to crash due to some potential bugs exposed
> by e.g. a specific traffic or host-requested setup. It usually results
> in various timeouts & running our of resources (e.g. ring slots).
> 
> Monitoring firmware state allows handling such a situation more
> gracefully. At this point the watchdog:
> 1) Prints a clear error message about a firmware crash
> 2) Tries to dump a crash info data

Hi Rafał,

I like the idea of having firmware crashes detected, but not sure I am a 
big fan of the watchdog mechanism.

> That should be helpful for users & should allow providing a valuable
> reports for the Broadcom developers. It obviously doesn't really fix
> anything.

Agree on the value of it and it is new functionality indeed and not a fix.

> This watchdog is important for USB & SDIO devices which don't have
> anything alike implemented yet. It's also there to complement PCIe with
> its FWHALT signal which does not work in all situations.

Actually, SDIO does already have a watchdog, but it serves different 
purpose (get firmware console, idle detect). There is also a 
brcmf_sdio_checkdied() function to detect a firmware crash. It is being 
called in brcmf_sdio_bus_rxctl() when firwmare does not respond as 
expected [1]. Similar thing can be done for PCIE and I actually started 
working on that last week. USB is a bit different as it will switch to 
bootloader image when firmware crashes.

Regards,
Arend

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/net/wireless/broadcom/brcm80211/brcmfmac/sdio.c#L3144
diff mbox series

Patch

diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
index 4fbe8791f674..ab7a54b2d831 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.c
@@ -40,6 +40,7 @@ 
 #include "common.h"
 
 #define MAX_WAIT_FOR_8021X_TX			msecs_to_jiffies(950)
+#define BRCMF_FW_WATCHDOG_POLL			msecs_to_jiffies(1000)
 
 #define BRCMF_BSSIDX_INVALID			-1
 
@@ -1084,6 +1085,28 @@  static int brcmf_revinfo_read(struct seq_file *s, void *data)
 	return 0;
 }
 
+static void brcmf_fw_watchdog(struct work_struct *work)
+{
+	struct brcmf_pub *drvr = container_of(work, struct brcmf_pub,
+					      fw_watchdog.work);
+
+	if (drvr->bus_if->state == BRCMF_BUS_UP) {
+		struct brcmf_if *ifp = drvr->iflist[0];
+		s32 ver;
+		int err;
+
+		err = brcmf_fil_cmd_int_get(ifp, BRCMF_C_GET_VERSION, &ver);
+		if (err) {
+			bphy_err(drvr, "Firmware has most likely crashed and didn't respond: %d\n",
+				 err);
+			brcmf_dev_coredump(drvr->bus_if->dev);
+			return;
+		}
+	}
+
+	schedule_delayed_work(&drvr->fw_watchdog, BRCMF_FW_WATCHDOG_POLL);
+}
+
 static int brcmf_bus_started(struct brcmf_pub *drvr, struct cfg80211_ops *ops)
 {
 	int ret = -1;
@@ -1155,6 +1178,9 @@  static int brcmf_bus_started(struct brcmf_pub *drvr, struct cfg80211_ops *ops)
 #endif
 #endif /* CONFIG_INET */
 
+	INIT_DELAYED_WORK(&drvr->fw_watchdog, brcmf_fw_watchdog);
+	schedule_delayed_work(&drvr->fw_watchdog, BRCMF_FW_WATCHDOG_POLL);
+
 	/* populate debugfs */
 	brcmf_debugfs_add_entry(drvr, "revinfo", brcmf_revinfo_read);
 	brcmf_feat_debugfs_create(drvr);
@@ -1284,6 +1310,8 @@  void brcmf_detach(struct device *dev)
 	if (drvr == NULL)
 		return;
 
+	cancel_delayed_work_sync(&drvr->fw_watchdog);
+
 #ifdef CONFIG_INET
 	unregister_inetaddr_notifier(&drvr->inetaddr_notifier);
 #endif
diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
index d8085ce579f4..1b44d243d05f 100644
--- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
+++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/core.h
@@ -143,6 +143,8 @@  struct brcmf_pub {
 	struct notifier_block inet6addr_notifier;
 	struct brcmf_mp_device *settings;
 
+	struct delayed_work fw_watchdog;
+
 	u8 clmver[BRCMF_DCMD_SMLEN];
 };