diff mbox series

MMC card detection may trip watchdog if card powered off

Message ID BN0PR08MB695133000AF116F04C3A9FFE83212@BN0PR08MB6951.namprd08.prod.outlook.com (mailing list archive)
State New
Headers show
Series MMC card detection may trip watchdog if card powered off | expand

Commit Message

Anthony Pighin (Nokia) Nov. 20, 2024, 4:21 p.m. UTC
If card detection is done via polling, due to broken-cd (Freescale LX2160, etc.), or for other reasons, then the card will be polled asynchronously and periodically.

If that polling happens after the card has been put in powered off state (i.e. during system shutdown/reboot), then the polling times out. That timeout is of a long duration (10s). And it is repeated multiple times (x3). And that is all done after the watchdogd has been disabled, meaning that system watchdogs are not being kicked.

If the MMC polling exceeds the watchdog trip time, then the system will be ungraciously reset. Or in the case of a pretimeout capable watchdog, the pretimeout will trip unnecessarily.

    [   46.872767] mmc_mrq_pr_debug:274: mmc1: starting CMD6 arg 03220301 flags 0000049d
    [   46.880258] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000001
    [   46.886082] sdhci_irq:3558: mmc1: sdhci: IRQ status 0x00000002
    [   46.891906] mmc_request_done:187: mmc1: req done (CMD6): 0: 00000800 00000000 00000000 00000000
    [   46.900606] mmc_set_ios:892: mmc1: clock 0Hz busmode 2 powermode 0 cs 0 Vdd 0 width 1 timing 0
    [   46.914934] mmc_mrq_pr_debug:274: mmc1: starting CMD13 arg 00010000 flags 00000195
    [   57.433351] mmc1: Timeout waiting for hardware cmd interrupt.
    ...
    [   71.031911] [Redacted] 2030000.i2c:[Redacted]@41:watchdog: Watchdog interrupt received!
    [   71.039737] Kernel panic - not syncing: watchdog pretimeout event
    [   71.045820] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G           O       6.6.59 #1
    [   71.053207] Hardware name: [Redacted]
    [   71.059897] Call trace:
    [   71.062332]  dump_backtrace+0x9c/0x128
    ...

CMD6 is SWITCH_FUNC and arg 03220301 is POWER_OFF_NOTIFICATION (bits 16:23 = 0x22 = 34).
CMD13 is SEND_STATUS, and when it occurs after the POWER_OFF_NOTIFICATION (as above) bad things happen.

I have made the following change to attempt to work around the issue, and it seems to hold up, but is also quite brute force:

Anthony
diff mbox series

Patch

--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -2046,6 +2046,11 @@  static void mmc_remove(struct mmc_host *host)
  */
 static int mmc_alive(struct mmc_host *host)
 {
+	if (host->card && mmc_card_suspended(host->card)) {
+		pr_err("%s: Skip card detection: Card suspended\n",
+		       mmc_hostname(host));
+		return -ENOMEDIUM;
+	}
 	return mmc_send_status(host->card, NULL);
 }