diff mbox series

[1/2] mmc: mmci: enable MMC_CAP_NEED_RSP_BUSY

Message ID 20210204120547.15381-2-yann.gautier@foss.st.com (mailing list archive)
State New, archived
Headers show
Series mmc: mmci/mmc_test: update mmc_erase management | expand

Commit Message

Yann Gautier Feb. 4, 2021, 12:05 p.m. UTC
From: Yann Gautier <yann.gautier@foss.st.com>

To properly manage commands awaiting R1B responses, the capability
MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
manage busy detection.
This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
MMC_CAP_WAIT_WHILE_BUSY to be enabled together.

Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
---
 drivers/mmc/host/mmci.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Marek Vasut Feb. 4, 2021, 12:26 p.m. UTC | #1
On 2/4/21 1:05 PM, yann.gautier@foss.st.com wrote:
> From: Yann Gautier <yann.gautier@foss.st.com>
> 
> To properly manage commands awaiting R1B responses, the capability
> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
> manage busy detection.
> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
> 
Shouldn't this have Fixes: tag ?
Yann Gautier Feb. 4, 2021, 12:54 p.m. UTC | #2
On 2/4/21 1:26 PM, Marek Vasut wrote:
> On 2/4/21 1:05 PM, yann.gautier@foss.st.com wrote:
>> From: Yann Gautier <yann.gautier@foss.st.com>
>>
>> To properly manage commands awaiting R1B responses, the capability
>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>> manage busy detection.
>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>
> Shouldn't this have Fixes: tag ?

Hi Marek,

There is no unique patch that brought the issue, but a combination of 
several things:
- The series that brought the MMC_CAP_NEED_RSP_BUSY flag [1]
- The series that enabled MMC_ERASE for all hosts [2] (removal of 
MMC_CAP_ERASE)

But you're right, this patch may go on v5.8.x kernel and newer versions.


Regards,
Yann

  [1] 
https://patchwork.kernel.org/project/linux-mmc/cover/20200310153340.5593-1-ulf.hansson@linaro.org/
  [2] 
https://patchwork.kernel.org/project/linux-mmc/patch/20200508112853.23525-1-ulf.hansson@linaro.org/
Marek Vasut Feb. 4, 2021, 1:07 p.m. UTC | #3
On 2/4/21 1:54 PM, Yann GAUTIER wrote:
> On 2/4/21 1:26 PM, Marek Vasut wrote:
>> On 2/4/21 1:05 PM, yann.gautier@foss.st.com wrote:
>>> From: Yann Gautier <yann.gautier@foss.st.com>
>>>
>>> To properly manage commands awaiting R1B responses, the capability
>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>>> manage busy detection.
>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>>
>> Shouldn't this have Fixes: tag ?
> 
> Hi Marek,
> 
> There is no unique patch that brought the issue, but a combination of 
> several things:
> - The series that brought the MMC_CAP_NEED_RSP_BUSY flag [1]
> - The series that enabled MMC_ERASE for all hosts [2] (removal of 
> MMC_CAP_ERASE)
> 
> But you're right, this patch may go on v5.8.x kernel and newer versions.

I think there will be quite some interest in 5.10.y LTS on the MP1 from 
the various industrial/embedded users, so it would be nice to have that 
5.10.y well maintained with necessary backports / fixes :)
Ulf Hansson Feb. 5, 2021, 9:53 a.m. UTC | #4
- trimmed cc-list

On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
>
> From: Yann Gautier <yann.gautier@foss.st.com>
>
> To properly manage commands awaiting R1B responses, the capability
> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
> manage busy detection.
> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.

Would it be possible for you to share a little bit more about the
problem? Like under what circumstances does things screw up?

Is the issue only occurring when the cmd->busy_timeout becomes larger
than host->max_busy_timeout. Or even in other cases?

>
> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
> ---
>  drivers/mmc/host/mmci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> index 1bc674577ff9..bf6971fdd1a6 100644
> --- a/drivers/mmc/host/mmci.c
> +++ b/drivers/mmc/host/mmci.c
> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>                 if (variant->busy_dpsm_flag)
>                         mmci_write_datactrlreg(host,
>                                                host->variant->busy_dpsm_flag);
> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_NEED_RSP_BUSY;

This isn't correct as the ux500 (and likely also other legacy
variants) don't need this. I have tried it in the past and it works
fine for ux500 without MMC_CAP_NEED_RSP_BUSY.

The difference is rather that the busy detection for stm32 variants
needs a corresponding HW busy timeout to be set (its
variant->busy_timeout flag is set). Perhaps we can use that
information instead?

Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
not be set by the core for erase commands, CMD5 and CMD6.

By looking at the code in mmci_start_command(), it looks like we will
default to a timeout of 10s, when cmd->busy_timeout isn't set. At
least for some erase requests, that won't be sufficient. Would it be
possible to disable the HW busy timeout in some way - and maybe use a
software timeout instead? Maybe I already asked Ludovic about this?
:-)

BTW, did you check that the MMCIDATATIMER does get the correct value
set for the timer in mmci_start_command() and if
host->max_busy_timeout gets correctly set in
mmci_set_max_busy_timeout()?

[...]

Kind regards
Uffe
Yann Gautier Feb. 5, 2021, 12:19 p.m. UTC | #5
On 2/5/21 10:53 AM, Ulf Hansson wrote:
> - trimmed cc-list
> 
> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
>>
>> From: Yann Gautier <yann.gautier@foss.st.com>
>>
>> To properly manage commands awaiting R1B responses, the capability
>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>> manage busy detection.
>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
> 
> Would it be possible for you to share a little bit more about the
> problem? Like under what circumstances does things screw up?
> 
> Is the issue only occurring when the cmd->busy_timeout becomes larger
> than host->max_busy_timeout. Or even in other cases?
> 
>>
>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
>> ---
>>   drivers/mmc/host/mmci.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
>> index 1bc674577ff9..bf6971fdd1a6 100644
>> --- a/drivers/mmc/host/mmci.c
>> +++ b/drivers/mmc/host/mmci.c
>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>>                  if (variant->busy_dpsm_flag)
>>                          mmci_write_datactrlreg(host,
>>                                                 host->variant->busy_dpsm_flag);
>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_NEED_RSP_BUSY;
> 
> This isn't correct as the ux500 (and likely also other legacy
> variants) don't need this. I have tried it in the past and it works
> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
> 
> The difference is rather that the busy detection for stm32 variants
> needs a corresponding HW busy timeout to be set (its
> variant->busy_timeout flag is set). Perhaps we can use that
> information instead?
> 
> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
> not be set by the core for erase commands, CMD5 and CMD6.
> 
> By looking at the code in mmci_start_command(), it looks like we will
> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
> least for some erase requests, that won't be sufficient. Would it be
> possible to disable the HW busy timeout in some way - and maybe use a
> software timeout instead? Maybe I already asked Ludovic about this?
> :-)
> 
> BTW, did you check that the MMCIDATATIMER does get the correct value
> set for the timer in mmci_start_command() and if
> host->max_busy_timeout gets correctly set in
> mmci_set_max_busy_timeout()?
> 
> [...]
> 
> Kind regards
> Uffe
> 

Hi Ulf,

Thanks for the hints.
I'll check all of that and get back with updated patches.

As I tried to explain in the cover letter and in reply to Adrian, I saw
a freeze (BUSYD0) in test 37 during MMC_ERASE command  with 
SECURE_ERASE_ARG, when running this test just after test 36 (or any 
other write test). But maybe, as you said that's mostly a incorrect 
timeout issue.

Regards,
Yann
Yann Gautier Feb. 8, 2021, 12:16 p.m. UTC | #6
On 2/5/21 1:19 PM, Yann GAUTIER wrote:
> On 2/5/21 10:53 AM, Ulf Hansson wrote:
>> - trimmed cc-list
>>
>> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
>>>
>>> From: Yann Gautier <yann.gautier@foss.st.com>
>>>
>>> To properly manage commands awaiting R1B responses, the capability
>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>>> manage busy detection.
>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>
>> Would it be possible for you to share a little bit more about the
>> problem? Like under what circumstances does things screw up?
>>
>> Is the issue only occurring when the cmd->busy_timeout becomes larger
>> than host->max_busy_timeout. Or even in other cases?
>>
>>>
>>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
>>> ---
>>>   drivers/mmc/host/mmci.c | 2 +-
>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
>>> index 1bc674577ff9..bf6971fdd1a6 100644
>>> --- a/drivers/mmc/host/mmci.c
>>> +++ b/drivers/mmc/host/mmci.c
>>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>>>                  if (variant->busy_dpsm_flag)
>>>                          mmci_write_datactrlreg(host,
>>>                                                 
>>> host->variant->busy_dpsm_flag);
>>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
>>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | 
>>> MMC_CAP_NEED_RSP_BUSY;
>>
>> This isn't correct as the ux500 (and likely also other legacy
>> variants) don't need this. I have tried it in the past and it works
>> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
>>
>> The difference is rather that the busy detection for stm32 variants
>> needs a corresponding HW busy timeout to be set (its
>> variant->busy_timeout flag is set). Perhaps we can use that
>> information instead?
>>
>> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
>> not be set by the core for erase commands, CMD5 and CMD6.
>>
>> By looking at the code in mmci_start_command(), it looks like we will
>> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
>> least for some erase requests, that won't be sufficient. Would it be
>> possible to disable the HW busy timeout in some way - and maybe use a
>> software timeout instead? Maybe I already asked Ludovic about this?
>> :-)
>>
>> BTW, did you check that the MMCIDATATIMER does get the correct value
>> set for the timer in mmci_start_command() and if
>> host->max_busy_timeout gets correctly set in
>> mmci_set_max_busy_timeout()?
>>
>> [...]
>>
>> Kind regards
>> Uffe
>>
> 
> Hi Ulf,
> 
> Thanks for the hints.
> I'll check all of that and get back with updated patches.
> 
> As I tried to explain in the cover letter and in reply to Adrian, I saw
> a freeze (BUSYD0) in test 37 during MMC_ERASE command  with 
> SECURE_ERASE_ARG, when running this test just after test 36 (or any 
> other write test). But maybe, as you said that's mostly a incorrect 
> timeout issue.
> 
> Regards,
> Yann

Hi,

I made some extra tests, and the timeout value set in MMCIDATATIMER 
correspond to the one computed:
card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
In mmc_mmc_erase_timeout(), we have:
erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the 
eMMC card I have: THGBMDG5D1LBAIL
erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)

This leads to a timeout of 14784000ms (~4 hours).
The max_busy_timeout is 86767ms.

After those 4 hours, I get this message:
mmc1: Card stuck being busy! __mmc_poll_for_busy

The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms, 
and uses R1B command.
But as the first erase failed, the DPSMACT is still enabled, the busy 
timeout doesn't seem to happen. Something may be missing in the error path.

Anyway, I'll push an update of the second patch of the series, and just 
drop this first one.


Regards,
Yann
Yann Gautier Feb. 8, 2021, 1:31 p.m. UTC | #7
On 2/8/21 1:16 PM, Yann GAUTIER wrote:
> On 2/5/21 1:19 PM, Yann GAUTIER wrote:
>> On 2/5/21 10:53 AM, Ulf Hansson wrote:
>>> - trimmed cc-list
>>>
>>> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
>>>>
>>>> From: Yann Gautier <yann.gautier@foss.st.com>
>>>>
>>>> To properly manage commands awaiting R1B responses, the capability
>>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>>>> manage busy detection.
>>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>>
>>> Would it be possible for you to share a little bit more about the
>>> problem? Like under what circumstances does things screw up?
>>>
>>> Is the issue only occurring when the cmd->busy_timeout becomes larger
>>> than host->max_busy_timeout. Or even in other cases?
>>>
>>>>
>>>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
>>>> ---
>>>>   drivers/mmc/host/mmci.c | 2 +-
>>>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
>>>> index 1bc674577ff9..bf6971fdd1a6 100644
>>>> --- a/drivers/mmc/host/mmci.c
>>>> +++ b/drivers/mmc/host/mmci.c
>>>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>>>>                  if (variant->busy_dpsm_flag)
>>>>                          mmci_write_datactrlreg(host,
>>>> host->variant->busy_dpsm_flag);
>>>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
>>>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | 
>>>> MMC_CAP_NEED_RSP_BUSY;
>>>
>>> This isn't correct as the ux500 (and likely also other legacy
>>> variants) don't need this. I have tried it in the past and it works
>>> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
>>>
>>> The difference is rather that the busy detection for stm32 variants
>>> needs a corresponding HW busy timeout to be set (its
>>> variant->busy_timeout flag is set). Perhaps we can use that
>>> information instead?
>>>
>>> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
>>> not be set by the core for erase commands, CMD5 and CMD6.
>>>
>>> By looking at the code in mmci_start_command(), it looks like we will
>>> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
>>> least for some erase requests, that won't be sufficient. Would it be
>>> possible to disable the HW busy timeout in some way - and maybe use a
>>> software timeout instead? Maybe I already asked Ludovic about this?
>>> :-)
>>>
>>> BTW, did you check that the MMCIDATATIMER does get the correct value
>>> set for the timer in mmci_start_command() and if
>>> host->max_busy_timeout gets correctly set in
>>> mmci_set_max_busy_timeout()?
>>>
>>> [...]
>>>
>>> Kind regards
>>> Uffe
>>>
>>
>> Hi Ulf,
>>
>> Thanks for the hints.
>> I'll check all of that and get back with updated patches.
>>
>> As I tried to explain in the cover letter and in reply to Adrian, I saw
>> a freeze (BUSYD0) in test 37 during MMC_ERASE command  with 
>> SECURE_ERASE_ARG, when running this test just after test 36 (or any 
>> other write test). But maybe, as you said that's mostly a incorrect 
>> timeout issue.
>>
>> Regards,
>> Yann
> 
> Hi,
> 
> I made some extra tests, and the timeout value set in MMCIDATATIMER 
> correspond to the one computed:
> card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
> In mmc_mmc_erase_timeout(), we have:
> erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the 
> eMMC card I have: THGBMDG5D1LBAIL
> erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
> erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)
> 
> This leads to a timeout of 14784000ms (~4 hours).
> The max_busy_timeout is 86767ms.
> 
> After those 4 hours, I get this message:
> mmc1: Card stuck being busy! __mmc_poll_for_busy
> 
> The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms, 
> and uses R1B command.
> But as the first erase failed, the DPSMACT is still enabled, the busy 
> timeout doesn't seem to happen. Something may be missing in the error path.
> 
> Anyway, I'll push an update of the second patch of the series, and just 
> drop this first one.
> 
> 
> Regards,
> Yann

I've discussed with Ludovic, and it is somewhat related to this patch set:
https://patchwork.kernel.org/project/linux-mmc/list/?series=186219&state=%2A&archive=both

The STM32 SDMMC IP needs a specific reset procedure when a data timeout 
occurs. If it is hardware, this is managed with the threaded IRQ. But if 
it is a SW polling (if R1B is replaced with R1), there is nothing in 
frameworks that could call this "unstuck" procedure for STM32 variant.
I don't know how this should be handled.


Regards,
Yann
Ulf Hansson Feb. 8, 2021, 3:03 p.m. UTC | #8
On Mon, 8 Feb 2021 at 13:16, Yann GAUTIER <yann.gautier@foss.st.com> wrote:
>
> On 2/5/21 1:19 PM, Yann GAUTIER wrote:
> > On 2/5/21 10:53 AM, Ulf Hansson wrote:
> >> - trimmed cc-list
> >>
> >> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
> >>>
> >>> From: Yann Gautier <yann.gautier@foss.st.com>
> >>>
> >>> To properly manage commands awaiting R1B responses, the capability
> >>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
> >>> manage busy detection.
> >>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
> >>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
> >>
> >> Would it be possible for you to share a little bit more about the
> >> problem? Like under what circumstances does things screw up?
> >>
> >> Is the issue only occurring when the cmd->busy_timeout becomes larger
> >> than host->max_busy_timeout. Or even in other cases?
> >>
> >>>
> >>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
> >>> ---
> >>>   drivers/mmc/host/mmci.c | 2 +-
> >>>   1 file changed, 1 insertion(+), 1 deletion(-)
> >>>
> >>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> >>> index 1bc674577ff9..bf6971fdd1a6 100644
> >>> --- a/drivers/mmc/host/mmci.c
> >>> +++ b/drivers/mmc/host/mmci.c
> >>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
> >>>                  if (variant->busy_dpsm_flag)
> >>>                          mmci_write_datactrlreg(host,
> >>>
> >>> host->variant->busy_dpsm_flag);
> >>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
> >>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY |
> >>> MMC_CAP_NEED_RSP_BUSY;
> >>
> >> This isn't correct as the ux500 (and likely also other legacy
> >> variants) don't need this. I have tried it in the past and it works
> >> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
> >>
> >> The difference is rather that the busy detection for stm32 variants
> >> needs a corresponding HW busy timeout to be set (its
> >> variant->busy_timeout flag is set). Perhaps we can use that
> >> information instead?
> >>
> >> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
> >> not be set by the core for erase commands, CMD5 and CMD6.
> >>
> >> By looking at the code in mmci_start_command(), it looks like we will
> >> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
> >> least for some erase requests, that won't be sufficient. Would it be
> >> possible to disable the HW busy timeout in some way - and maybe use a
> >> software timeout instead? Maybe I already asked Ludovic about this?
> >> :-)
> >>
> >> BTW, did you check that the MMCIDATATIMER does get the correct value
> >> set for the timer in mmci_start_command() and if
> >> host->max_busy_timeout gets correctly set in
> >> mmci_set_max_busy_timeout()?
> >>
> >> [...]
> >>
> >> Kind regards
> >> Uffe
> >>
> >
> > Hi Ulf,
> >
> > Thanks for the hints.
> > I'll check all of that and get back with updated patches.
> >
> > As I tried to explain in the cover letter and in reply to Adrian, I saw
> > a freeze (BUSYD0) in test 37 during MMC_ERASE command  with
> > SECURE_ERASE_ARG, when running this test just after test 36 (or any
> > other write test). But maybe, as you said that's mostly a incorrect
> > timeout issue.
> >
> > Regards,
> > Yann
>
> Hi,
>
> I made some extra tests, and the timeout value set in MMCIDATATIMER
> correspond to the one computed:
> card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
> In mmc_mmc_erase_timeout(), we have:
> erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the
> eMMC card I have: THGBMDG5D1LBAIL
> erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
> erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)
>
> This leads to a timeout of 14784000ms (~4 hours).
> The max_busy_timeout is 86767ms.
>
> After those 4 hours, I get this message:
> mmc1: Card stuck being busy! __mmc_poll_for_busy

Okay, I see.

This means that we end up polling for busy in __mmc_poll_for_busy().
However, not by using CMD13 but rather with the ->card_busy() ops (as
mmci has this callback set).

Could it be that the ->card_busy() callback isn't working correctly
for the stm32 variant?

What happens if you temporarily drop the assignment of the
->card_busy() callback, thus force the mmc core to poll with CMD13
instead? Would this work?

>
> The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms,
> and uses R1B command.
> But as the first erase failed, the DPSMACT is still enabled, the busy
> timeout doesn't seem to happen. Something may be missing in the error path.

Assuming the eMMC card completed the first erase operation
successfully, then yes, the second erase should work.

However, what if the eMMC actually failed with the first erase? How can we know?

>
> Anyway, I'll push an update of the second patch of the series, and just
> drop this first one.
>
>
> Regards,
> Yann

Kind regards
Uffe
Yann Gautier Feb. 9, 2021, 2:01 p.m. UTC | #9
On 2/8/21 4:03 PM, Ulf Hansson wrote:
> On Mon, 8 Feb 2021 at 13:16, Yann GAUTIER <yann.gautier@foss.st.com> wrote:
>>
>> On 2/5/21 1:19 PM, Yann GAUTIER wrote:
>>> On 2/5/21 10:53 AM, Ulf Hansson wrote:
>>>> - trimmed cc-list
>>>>
>>>> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
>>>>>
>>>>> From: Yann Gautier <yann.gautier@foss.st.com>
>>>>>
>>>>> To properly manage commands awaiting R1B responses, the capability
>>>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
>>>>> manage busy detection.
>>>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
>>>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
>>>>
>>>> Would it be possible for you to share a little bit more about the
>>>> problem? Like under what circumstances does things screw up?
>>>>
>>>> Is the issue only occurring when the cmd->busy_timeout becomes larger
>>>> than host->max_busy_timeout. Or even in other cases?
>>>>
>>>>>
>>>>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
>>>>> ---
>>>>>    drivers/mmc/host/mmci.c | 2 +-
>>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
>>>>> index 1bc674577ff9..bf6971fdd1a6 100644
>>>>> --- a/drivers/mmc/host/mmci.c
>>>>> +++ b/drivers/mmc/host/mmci.c
>>>>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
>>>>>                   if (variant->busy_dpsm_flag)
>>>>>                           mmci_write_datactrlreg(host,
>>>>>
>>>>> host->variant->busy_dpsm_flag);
>>>>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
>>>>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY |
>>>>> MMC_CAP_NEED_RSP_BUSY;
>>>>
>>>> This isn't correct as the ux500 (and likely also other legacy
>>>> variants) don't need this. I have tried it in the past and it works
>>>> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
>>>>
>>>> The difference is rather that the busy detection for stm32 variants
>>>> needs a corresponding HW busy timeout to be set (its
>>>> variant->busy_timeout flag is set). Perhaps we can use that
>>>> information instead?
>>>>
>>>> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
>>>> not be set by the core for erase commands, CMD5 and CMD6.
>>>>
>>>> By looking at the code in mmci_start_command(), it looks like we will
>>>> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
>>>> least for some erase requests, that won't be sufficient. Would it be
>>>> possible to disable the HW busy timeout in some way - and maybe use a
>>>> software timeout instead? Maybe I already asked Ludovic about this?
>>>> :-)
>>>>
>>>> BTW, did you check that the MMCIDATATIMER does get the correct value
>>>> set for the timer in mmci_start_command() and if
>>>> host->max_busy_timeout gets correctly set in
>>>> mmci_set_max_busy_timeout()?
>>>>
>>>> [...]
>>>>
>>>> Kind regards
>>>> Uffe
>>>>
>>>
>>> Hi Ulf,
>>>
>>> Thanks for the hints.
>>> I'll check all of that and get back with updated patches.
>>>
>>> As I tried to explain in the cover letter and in reply to Adrian, I saw
>>> a freeze (BUSYD0) in test 37 during MMC_ERASE command  with
>>> SECURE_ERASE_ARG, when running this test just after test 36 (or any
>>> other write test). But maybe, as you said that's mostly a incorrect
>>> timeout issue.
>>>
>>> Regards,
>>> Yann
>>
>> Hi,
>>
>> I made some extra tests, and the timeout value set in MMCIDATATIMER
>> correspond to the one computed:
>> card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
>> In mmc_mmc_erase_timeout(), we have:
>> erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the
>> eMMC card I have: THGBMDG5D1LBAIL
>> erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
>> erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)
>>
>> This leads to a timeout of 14784000ms (~4 hours).
>> The max_busy_timeout is 86767ms.
>>
>> After those 4 hours, I get this message:
>> mmc1: Card stuck being busy! __mmc_poll_for_busy
> 
> Okay, I see.
> 
> This means that we end up polling for busy in __mmc_poll_for_busy().
> However, not by using CMD13 but rather with the ->card_busy() ops (as
> mmci has this callback set).
> 
> Could it be that the ->card_busy() callback isn't working correctly
> for the stm32 variant?
> 
> What happens if you temporarily drop the assignment of the
> ->card_busy() callback, thus force the mmc core to poll with CMD13
> instead? Would this work?
> 

Hi Ulf,

When ->card_busy() is stubbed for MMC_ERASE command, CMD13 is running in 
loop, for ~66 seconds.
The card status is just 0xe00 here, no errors, just prog state, as 
awaited for CMD38, and READY_FOR_DATA bit not set.
And after those 66 seconds, the status changes to 0x900.
But busyd0 is still set to 1, during the CMD13 and after.
The test continues, with a CMD25, still with busyd0 and DPSM so the IP 
is stuck, and the STOP command is sent (mrq->stop in mmc_mrq_pr_debug).
And here nothing more happens, wait_for_completion() from 
mmc_wait_for_req_done().

>>
>> The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms,
>> and uses R1B command.
>> But as the first erase failed, the DPSMACT is still enabled, the busy
>> timeout doesn't seem to happen. Something may be missing in the error path.
> 
> Assuming the eMMC card completed the first erase operation
> successfully, then yes, the second erase should work.
> 
> However, what if the eMMC actually failed with the first erase? How can we know?

In the case where ->card_busy() is used:
As busyd0 is still 1 in the STATUS register, we cannot know if the 
command really finished. And as the DPSM is stuck, we cannot send other 
read command to check what is on the card.

As I said in another thread:
"I've discussed with Ludovic, and it is somewhat related to this patch set:
https://patchwork.kernel.org/project/linux-mmc/list/?series=186219&state=%2A&archive=both

The STM32 SDMMC IP needs a specific reset procedure when a data timeout 
occurs. If it is hardware, this is managed with the threaded IRQ. But if 
it is a SW polling (if R1B is replaced with R1), there is nothing in 
frameworks that could call this "unstuck" procedure for STM32 variant.
I don't know how this should be handled."

Are there other things I should check?

The main issue here is that we cannot use R1B if timeout > 
mmc->max_busy_timeout, and SW polling won't be able to call our reset 
procedure in case of trouble.

The second patch in the series, changing the MMC_ERASE argument from 
MMC_SECURE_ERASE_ARG to the argument chosen in the framework will then 
compute a timeout lower than mmc->max_busy_timeout, and the test will pass.
But this does not explain why STM32 SDMMC IP doesn't react well to this 
secure argument after it has executed a write test.


Thanks,
Yann

> 
>>
>> Anyway, I'll push an update of the second patch of the series, and just
>> drop this first one.
>>
>>
>> Regards,
>> Yann
> 
> Kind regards
> Uffe
>
Ulf Hansson Feb. 10, 2021, 12:03 p.m. UTC | #10
On Tue, 9 Feb 2021 at 15:01, Yann Gautier <yann.gautier@foss.st.com> wrote:
>
> On 2/8/21 4:03 PM, Ulf Hansson wrote:
> > On Mon, 8 Feb 2021 at 13:16, Yann GAUTIER <yann.gautier@foss.st.com> wrote:
> >>
> >> On 2/5/21 1:19 PM, Yann GAUTIER wrote:
> >>> On 2/5/21 10:53 AM, Ulf Hansson wrote:
> >>>> - trimmed cc-list
> >>>>
> >>>> On Thu, 4 Feb 2021 at 13:08, <yann.gautier@foss.st.com> wrote:
> >>>>>
> >>>>> From: Yann Gautier <yann.gautier@foss.st.com>
> >>>>>
> >>>>> To properly manage commands awaiting R1B responses, the capability
> >>>>> MMC_CAP_NEED_RSP_BUSY is enabled in mmci driver, for variants that
> >>>>> manage busy detection.
> >>>>> This R1B management needs both the flags MMC_CAP_NEED_RSP_BUSY and
> >>>>> MMC_CAP_WAIT_WHILE_BUSY to be enabled together.
> >>>>
> >>>> Would it be possible for you to share a little bit more about the
> >>>> problem? Like under what circumstances does things screw up?
> >>>>
> >>>> Is the issue only occurring when the cmd->busy_timeout becomes larger
> >>>> than host->max_busy_timeout. Or even in other cases?
> >>>>
> >>>>>
> >>>>> Signed-off-by: Yann Gautier <yann.gautier@foss.st.com>
> >>>>> ---
> >>>>>    drivers/mmc/host/mmci.c | 2 +-
> >>>>>    1 file changed, 1 insertion(+), 1 deletion(-)
> >>>>>
> >>>>> diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
> >>>>> index 1bc674577ff9..bf6971fdd1a6 100644
> >>>>> --- a/drivers/mmc/host/mmci.c
> >>>>> +++ b/drivers/mmc/host/mmci.c
> >>>>> @@ -2148,7 +2148,7 @@ static int mmci_probe(struct amba_device *dev,
> >>>>>                   if (variant->busy_dpsm_flag)
> >>>>>                           mmci_write_datactrlreg(host,
> >>>>>
> >>>>> host->variant->busy_dpsm_flag);
> >>>>> -               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
> >>>>> +               mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY |
> >>>>> MMC_CAP_NEED_RSP_BUSY;
> >>>>
> >>>> This isn't correct as the ux500 (and likely also other legacy
> >>>> variants) don't need this. I have tried it in the past and it works
> >>>> fine for ux500 without MMC_CAP_NEED_RSP_BUSY.
> >>>>
> >>>> The difference is rather that the busy detection for stm32 variants
> >>>> needs a corresponding HW busy timeout to be set (its
> >>>> variant->busy_timeout flag is set). Perhaps we can use that
> >>>> information instead?
> >>>>
> >>>> Note that, MMC_CAP_NEED_RSP_BUSY, means that cmd->busy_timeout will
> >>>> not be set by the core for erase commands, CMD5 and CMD6.
> >>>>
> >>>> By looking at the code in mmci_start_command(), it looks like we will
> >>>> default to a timeout of 10s, when cmd->busy_timeout isn't set. At
> >>>> least for some erase requests, that won't be sufficient. Would it be
> >>>> possible to disable the HW busy timeout in some way - and maybe use a
> >>>> software timeout instead? Maybe I already asked Ludovic about this?
> >>>> :-)
> >>>>
> >>>> BTW, did you check that the MMCIDATATIMER does get the correct value
> >>>> set for the timer in mmci_start_command() and if
> >>>> host->max_busy_timeout gets correctly set in
> >>>> mmci_set_max_busy_timeout()?
> >>>>
> >>>> [...]
> >>>>
> >>>> Kind regards
> >>>> Uffe
> >>>>
> >>>
> >>> Hi Ulf,
> >>>
> >>> Thanks for the hints.
> >>> I'll check all of that and get back with updated patches.
> >>>
> >>> As I tried to explain in the cover letter and in reply to Adrian, I saw
> >>> a freeze (BUSYD0) in test 37 during MMC_ERASE command  with
> >>> SECURE_ERASE_ARG, when running this test just after test 36 (or any
> >>> other write test). But maybe, as you said that's mostly a incorrect
> >>> timeout issue.
> >>>
> >>> Regards,
> >>> Yann
> >>
> >> Hi,
> >>
> >> I made some extra tests, and the timeout value set in MMCIDATATIMER
> >> correspond to the one computed:
> >> card->ext_csd.erase_group_def is set to 1 in mmc_init_card()
> >> In mmc_mmc_erase_timeout(), we have:
> >> erase_timeout = card->ext_csd.hc_erase_timeout; // 300ms * 0x07 (for the
> >> eMMC card I have: THGBMDG5D1LBAIL
> >> erase_timeout *= card->ext_csd.sec_erase_mult; // 0xDC
> >> erase_timeout *= qty; // 32 (from = 0x1d0000, to = 0x20ffff)
> >>
> >> This leads to a timeout of 14784000ms (~4 hours).
> >> The max_busy_timeout is 86767ms.
> >>
> >> After those 4 hours, I get this message:
> >> mmc1: Card stuck being busy! __mmc_poll_for_busy
> >
> > Okay, I see.
> >
> > This means that we end up polling for busy in __mmc_poll_for_busy().
> > However, not by using CMD13 but rather with the ->card_busy() ops (as
> > mmci has this callback set).
> >
> > Could it be that the ->card_busy() callback isn't working correctly
> > for the stm32 variant?
> >
> > What happens if you temporarily drop the assignment of the
> > ->card_busy() callback, thus force the mmc core to poll with CMD13
> > instead? Would this work?
> >
>
> Hi Ulf,
>
> When ->card_busy() is stubbed for MMC_ERASE command, CMD13 is running in
> loop, for ~66 seconds.
> The card status is just 0xe00 here, no errors, just prog state, as
> awaited for CMD38, and READY_FOR_DATA bit not set.
> And after those 66 seconds, the status changes to 0x900.
> But busyd0 is still set to 1, during the CMD13 and after.
> The test continues, with a CMD25, still with busyd0 and DPSM so the IP
> is stuck, and the STOP command is sent (mrq->stop in mmc_mrq_pr_debug).
> And here nothing more happens, wait_for_completion() from
> mmc_wait_for_req_done().

Okay, I see. Thanks for sharing the details.

>
> >>
> >> The second erase with MMC_ERASE_ARG finds an erase timeout of 67200ms,
> >> and uses R1B command.
> >> But as the first erase failed, the DPSMACT is still enabled, the busy
> >> timeout doesn't seem to happen. Something may be missing in the error path.
> >
> > Assuming the eMMC card completed the first erase operation
> > successfully, then yes, the second erase should work.
> >
> > However, what if the eMMC actually failed with the first erase? How can we know?
>
> In the case where ->card_busy() is used:
> As busyd0 is still 1 in the STATUS register, we cannot know if the
> command really finished. And as the DPSM is stuck, we cannot send other
> read command to check what is on the card.
>
> As I said in another thread:
> "I've discussed with Ludovic, and it is somewhat related to this patch set:
> https://patchwork.kernel.org/project/linux-mmc/list/?series=186219&state=%2A&archive=both
>
> The STM32 SDMMC IP needs a specific reset procedure when a data timeout
> occurs. If it is hardware, this is managed with the threaded IRQ. But if
> it is a SW polling (if R1B is replaced with R1), there is nothing in
> frameworks that could call this "unstuck" procedure for STM32 variant.
> I don't know how this should be handled."
>
> Are there other things I should check?
>
> The main issue here is that we cannot use R1B if timeout >
> mmc->max_busy_timeout, and SW polling won't be able to call our reset
> procedure in case of trouble.

Right, I understand.

As a simple initial fix, I would suggest you to set
MMC_CAP_NEED_RSP_BUSY for the stm32 variants. This also means you need
to cap the used timeout written to MMCIDATATIMER (when using
MMC_CAP_NEED_RSP_BUSY, cmd->busy_timeout may be greater than
host->max_busy_timeout).

I guess this should work for almost all cases, as 86767ms is rather long.

Long term wise, we need to think of another option. It seems to be
okay to use R1 in favor of R1B and then poll with CMD13. However, in
such cases the controller needs to be reset to move out from DPSM.
Perhaps we can let the core invoke a new callback in the host, to let
it deal with this in some way...?

>
> The second patch in the series, changing the MMC_ERASE argument from
> MMC_SECURE_ERASE_ARG to the argument chosen in the framework will then
> compute a timeout lower than mmc->max_busy_timeout, and the test will pass.
> But this does not explain why STM32 SDMMC IP doesn't react well to this
> secure argument after it has executed a write test.
>
>
> Thanks,
> Yann
>

Kind regards
Uffe
diff mbox series

Patch

diff --git a/drivers/mmc/host/mmci.c b/drivers/mmc/host/mmci.c
index 1bc674577ff9..bf6971fdd1a6 100644
--- a/drivers/mmc/host/mmci.c
+++ b/drivers/mmc/host/mmci.c
@@ -2148,7 +2148,7 @@  static int mmci_probe(struct amba_device *dev,
 		if (variant->busy_dpsm_flag)
 			mmci_write_datactrlreg(host,
 					       host->variant->busy_dpsm_flag);
-		mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY;
+		mmc->caps |= MMC_CAP_WAIT_WHILE_BUSY | MMC_CAP_NEED_RSP_BUSY;
 	}
 
 	/* Prepare a CMD12 - needed to clear the DPSM on some variants. */