mbox series

[00/13] Fix busydetect on Ux500 PL180/MMCI

Message ID 20230405-pl180-busydetect-fix-v1-0-28ac19a74e5e@linaro.org (mailing list archive)
Headers show
Series Fix busydetect on Ux500 PL180/MMCI | expand

Message

Linus Walleij April 5, 2023, 6:50 a.m. UTC
This series fixes a pretty serious problem in the MMCI
busy detect handling, discovered only after going up and
down a ladder of refactorings.

The code is written expecting the Ux500 busy detect
to fire two interrupts: one at the start of the busy
signalling and one at the end of the busy signalling.

The root cause of the problem seen on some devices
is that only the first IRQ arrives, and then the device
hangs, waiting perpetually for the next IRQ to arrive.

This is eventually solved by adding a timeout using
a delayed work that fire after 10 ms if the busy detect
has not stopped at this point. (Other delay spans can
be suggested.) This is the last patch in the series.

I included the rewrite of the entire busy detect logic
to use a state machine as this makes it way easier to
debug and will print messages about other error
conditions as well.

The problem affects especially the Skomer
(Samsung GT-I9070) and Kyle (Samsung SGH-I407).

It is fine to just apply this for the -next kernel,
despite it fixes the busy detect that has been broken
for these devices for a while, we can think about
backporting a simpler version of the timeout for
stable kernels if we want.

Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
---
Linus Walleij (13):
      mmc: mmci: Only call busy_complete if callback defined
      mmc: mmci: Clear busy_status when starting command
      mmc: mmci: Unwind big if() clause
      mmc: mmci: Stash status while waiting for busy
      mmc: mmci: Break out error check in busy detect
      mmc: mmci: Make busy complete state machine explicit
      mmc: mmci: Retry the busy start condition
      mmc: mmci: Use state machine state as exit condition
      mmc: mmci: Use a switch statement machine
      mmc: mmci: Break out a helper function
      mmc: mmci: mmci_card_busy() from state machine
      mmc: mmci: Drop end IRQ from Ux500 busydetect
      mmc: mmci: Add busydetect timeout

 drivers/mmc/host/mmci.c             | 165 +++++++++++++++++++++++++++---------
 drivers/mmc/host/mmci.h             |  17 ++++
 drivers/mmc/host/mmci_stm32_sdmmc.c |   6 +-
 3 files changed, 149 insertions(+), 39 deletions(-)
---
base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
change-id: 20230405-pl180-busydetect-fix-66a0360d398a

Best regards,

Comments

Yann Gautier April 6, 2023, 9:27 a.m. UTC | #1
On 4/5/23 08:50, Linus Walleij wrote:
> This series fixes a pretty serious problem in the MMCI
> busy detect handling, discovered only after going up and
> down a ladder of refactorings.
> 
> The code is written expecting the Ux500 busy detect
> to fire two interrupts: one at the start of the busy
> signalling and one at the end of the busy signalling.
> 
> The root cause of the problem seen on some devices
> is that only the first IRQ arrives, and then the device
> hangs, waiting perpetually for the next IRQ to arrive.
> 
> This is eventually solved by adding a timeout using
> a delayed work that fire after 10 ms if the busy detect
> has not stopped at this point. (Other delay spans can
> be suggested.) This is the last patch in the series.
> 
> I included the rewrite of the entire busy detect logic
> to use a state machine as this makes it way easier to
> debug and will print messages about other error
> conditions as well.
> 
> The problem affects especially the Skomer
> (Samsung GT-I9070) and Kyle (Samsung SGH-I407).
> 
> It is fine to just apply this for the -next kernel,
> despite it fixes the busy detect that has been broken
> for these devices for a while, we can think about
> backporting a simpler version of the timeout for
> stable kernels if we want.
> 
> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
> ---
> Linus Walleij (13):
>        mmc: mmci: Only call busy_complete if callback defined
>        mmc: mmci: Clear busy_status when starting command
>        mmc: mmci: Unwind big if() clause
>        mmc: mmci: Stash status while waiting for busy
>        mmc: mmci: Break out error check in busy detect
>        mmc: mmci: Make busy complete state machine explicit
>        mmc: mmci: Retry the busy start condition
>        mmc: mmci: Use state machine state as exit condition
>        mmc: mmci: Use a switch statement machine
>        mmc: mmci: Break out a helper function
>        mmc: mmci: mmci_card_busy() from state machine
>        mmc: mmci: Drop end IRQ from Ux500 busydetect
>        mmc: mmci: Add busydetect timeout
> 
>   drivers/mmc/host/mmci.c             | 165 +++++++++++++++++++++++++++---------
>   drivers/mmc/host/mmci.h             |  17 ++++
>   drivers/mmc/host/mmci_stm32_sdmmc.c |   6 +-
>   3 files changed, 149 insertions(+), 39 deletions(-)
> ---
> base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
> change-id: 20230405-pl180-busydetect-fix-66a0360d398a
> 
> Best regards,

Hi Linus,

With this series, it is no more possible to switch to UHS modes on 
STM32MP1. I will add some traces and try to see what's wrong or missing 
to have it working again.
I've tested on STM32MP157F-EV1 board and got those traces:
[    3.186504] mmc0: Skipping voltage switch
[    3.302170] mmc0: new high speed SDHC card at address aaaa

Whereas with the previous version I had:
[    1.882084] mmc0: new ultra high speed DDR50 SDHC card at address aaaa
[    1.899142] mmcblk0: mmc0:aaaa SC16G 14.8 GiB

On STM32MP13 that can do SDR104 (tested on an internal board), it fails 
to initialized the card:
[    3.194072] mmc0: error -110 whilst initialising SD card

When adding some dynamic debug traces, I can see this in the trace:
[   78.039648] mmc0: Signal voltage switch failed, power cycling card

I'll try to add more traces to see what happens exactly.


Best regards,
Yann
Yann Gautier April 6, 2023, 11:44 a.m. UTC | #2
On 4/6/23 11:27, Yann Gautier wrote:
> On 4/5/23 08:50, Linus Walleij wrote:
>> This series fixes a pretty serious problem in the MMCI
>> busy detect handling, discovered only after going up and
>> down a ladder of refactorings.
>>
>> The code is written expecting the Ux500 busy detect
>> to fire two interrupts: one at the start of the busy
>> signalling and one at the end of the busy signalling.
>>
>> The root cause of the problem seen on some devices
>> is that only the first IRQ arrives, and then the device
>> hangs, waiting perpetually for the next IRQ to arrive.
>>
>> This is eventually solved by adding a timeout using
>> a delayed work that fire after 10 ms if the busy detect
>> has not stopped at this point. (Other delay spans can
>> be suggested.) This is the last patch in the series.
>>
>> I included the rewrite of the entire busy detect logic
>> to use a state machine as this makes it way easier to
>> debug and will print messages about other error
>> conditions as well.
>>
>> The problem affects especially the Skomer
>> (Samsung GT-I9070) and Kyle (Samsung SGH-I407).
>>
>> It is fine to just apply this for the -next kernel,
>> despite it fixes the busy detect that has been broken
>> for these devices for a while, we can think about
>> backporting a simpler version of the timeout for
>> stable kernels if we want.
>>
>> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
>> ---
>> Linus Walleij (13):
>>        mmc: mmci: Only call busy_complete if callback defined
>>        mmc: mmci: Clear busy_status when starting command
>>        mmc: mmci: Unwind big if() clause
>>        mmc: mmci: Stash status while waiting for busy
>>        mmc: mmci: Break out error check in busy detect
>>        mmc: mmci: Make busy complete state machine explicit
>>        mmc: mmci: Retry the busy start condition
>>        mmc: mmci: Use state machine state as exit condition
>>        mmc: mmci: Use a switch statement machine
>>        mmc: mmci: Break out a helper function
>>        mmc: mmci: mmci_card_busy() from state machine
>>        mmc: mmci: Drop end IRQ from Ux500 busydetect
>>        mmc: mmci: Add busydetect timeout
>>
>>   drivers/mmc/host/mmci.c             | 165 
>> +++++++++++++++++++++++++++---------
>>   drivers/mmc/host/mmci.h             |  17 ++++
>>   drivers/mmc/host/mmci_stm32_sdmmc.c |   6 +-
>>   3 files changed, 149 insertions(+), 39 deletions(-)
>> ---
>> base-commit: fe15c26ee26efa11741a7b632e9f23b01aca4cc6
>> change-id: 20230405-pl180-busydetect-fix-66a0360d398a
>>
>> Best regards,
> 
> Hi Linus,
> 
> With this series, it is no more possible to switch to UHS modes on 
> STM32MP1. I will add some traces and try to see what's wrong or missing 
> to have it working again.
> I've tested on STM32MP157F-EV1 board and got those traces:
> [    3.186504] mmc0: Skipping voltage switch
> [    3.302170] mmc0: new high speed SDHC card at address aaaa
> 
> Whereas with the previous version I had:
> [    1.882084] mmc0: new ultra high speed DDR50 SDHC card at address aaaa
> [    1.899142] mmcblk0: mmc0:aaaa SC16G 14.8 GiB
> 
> On STM32MP13 that can do SDR104 (tested on an internal board), it fails 
> to initialized the card:
> [    3.194072] mmc0: error -110 whilst initialising SD card
> 
> When adding some dynamic debug traces, I can see this in the trace:
> [   78.039648] mmc0: Signal voltage switch failed, power cycling card
> 
> I'll try to add more traces to see what happens exactly.
> 
> 
> Best regards,
> Yann


Hi,

The issue occurs in  mmc_set_uhs_voltage(), when calling 
host->ops->card_busy(), which is mmci_card_busy() for this driver.
It is expected here to return busy after the SD_SWITCH_VOLTAGE (CMD11),
but host->busy_state = 0 in mmci_card_busy(), so it return 0.

A possible correction, but that may break what you've done, could be:

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 3ff6655a56d1..006fd84090ac 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -340,13 +340,10 @@ static int mmci_card_busy(struct mmc_host *mmc)
         unsigned long flags;
         int busy = 0;

-       if (host->ops->busy_complete) {
-               if ((host->busy_state == MMCI_BUSY_IDLE) ||
-                   (host->busy_state == MMCI_BUSY_DONE))
-                       return 0;
-               else
-                       return 1;
-       }
+       if (host->ops->busy_complete &&
+           host->busy_state != MMCI_BUSY_IDLE &&
+           host->busy_state != MMCI_BUSY_DONE)
+               return 1;

         spin_lock_irqsave(&host->lock, flags);


The SD-card is now properly detected (and in SDR104 mode) on STM32MP13 
board.


Best regards,
Yann