mmc: core: Do not hold re-tuning during CMD6 commands

Message ID	1490336341-22292-2-git-send-email-chaotian.jing@mediatek.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-mmc-owner@kernel.org> From: Chaotian Jing <chaotian.jing@mediatek.com> To: Ulf Hansson <ulf.hansson@linaro.org> CC: Matthias Brugger <matthias.bgg@gmail.com>, Adrian Hunter <adrian.hunter@intel.com>, Jaehoon Chung <jh80.chung@samsung.com>, Chaotian Jing <chaotian.jing@mediatek.com>, Shawn Lin <shawn.lin@rock-chips.com>, Masahiro Yamada <yamada.masahiro@socionext.com>, <linux-mmc@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <linux-arm-kernel@lists.infradead.org>, <linux-mediatek@lists.infradead.org>, <srv_heupstream@mediatek.com> Subject: [PATCH] mmc: core: Do not hold re-tuning during CMD6 commands Date: Fri, 24 Mar 2017 14:19:00 +0800 Message-ID: <1490336341-22292-2-git-send-email-chaotian.jing@mediatek.com> In-Reply-To: <1490336341-22292-1-git-send-email-chaotian.jing@mediatek.com> References: <1490336341-22292-1-git-send-email-chaotian.jing@mediatek.com> MIME-Version: 1.0 Content-Type: text/plain Sender: linux-mmc-owner@vger.kernel.org Precedence: bulk

Chaotian Jing March 24, 2017, 6:19 a.m. UTC

this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
during switch commands")'
Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
then should do re-tune before the next CMD6 was sent.

Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
---
 drivers/mmc/core/mmc_ops.c |    3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Adrian Hunter March 24, 2017, 7:52 a.m. UTC | #1

On 24/03/17 08:19, Chaotian Jing wrote:
> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
> during switch commands")'
> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
> then should do re-tune before the next CMD6 was sent.
> 
> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
> ---
>  drivers/mmc/core/mmc_ops.c |    3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> index fe80f26..6931927 100644
> --- a/drivers/mmc/core/mmc_ops.c
> +++ b/drivers/mmc/core/mmc_ops.c
> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>  	bool use_r1b_resp = use_busy_signal;
>  	unsigned char old_timing = host->ios.timing;
>  
> -	mmc_retune_hold(host);
> -
>  	/*
>  	 * If the cmd timeout and the max_busy_timeout of the host are both
>  	 * specified, let's validate them. A failure means we need to prevent
> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>  		cmd.sanitize_busy = true;
>  
>  	err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> +	mmc_retune_hold(host);

That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
it is here:

	https://marc.info/?l=linux-mmc&m=148940903816582

But using "retries" with commands that have busy-waiting on the data line
doesn't make much sense anyway.  Particularly with CRC errors, I would
expect the card is actually busily doing the switch and we need only to wait
for it.  The same can be true for timeout errors.  For some CMD6 we might
need to send CMD12 if the card is busy after an error.  I would prefer an
explicit attempt at recovery from CMD6 errors.

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 24, 2017, 8:32 a.m. UTC | #2

On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote:
> On 24/03/17 08:19, Chaotian Jing wrote:
> > this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
> > during switch commands")'
> > Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
> > then should do re-tune before the next CMD6 was sent.
> > 
> > Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
> > ---
> >  drivers/mmc/core/mmc_ops.c |    3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> > index fe80f26..6931927 100644
> > --- a/drivers/mmc/core/mmc_ops.c
> > +++ b/drivers/mmc/core/mmc_ops.c
> > @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >  	bool use_r1b_resp = use_busy_signal;
> >  	unsigned char old_timing = host->ios.timing;
> >  
> > -	mmc_retune_hold(host);
> > -
> >  	/*
> >  	 * If the cmd timeout and the max_busy_timeout of the host are both
> >  	 * specified, let's validate them. A failure means we need to prevent
> > @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >  		cmd.sanitize_busy = true;
> >  
> >  	err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> > +	mmc_retune_hold(host);
> 
> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
> it is here:
> 
> 	https://marc.info/?l=linux-mmc&m=148940903816582
> 
> But using "retries" with commands that have busy-waiting on the data line
> doesn't make much sense anyway.  Particularly with CRC errors, I would
> expect the card is actually busily doing the switch and we need only to wait
> for it.  The same can be true for timeout errors.  For some CMD6 we might
> need to send CMD12 if the card is busy after an error.  I would prefer an
> explicit attempt at recovery from CMD6 errors.
> 

It's the host driver's responsibility to ensure card is not in busy
state before issue the next R1B command, or the MMC core layer needs do
extra check/waiting before issue a R1B command.
I think the purpose of "re-tune" is trying to cover particular case(eg.
voltage fluctuate or EMI or some glitch of host/device which caused CRC
error) , but in such cases, too many cases are disable re-tune function
by mmc_retune_hold(), for example, in this case, if a response CRC error
got then we never have chance to recover it. then cause system cannot
access emmc or suspend/resume fail.


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter March 24, 2017, 9:19 a.m. UTC | #3

On 24/03/17 10:32, Chaotian Jing wrote:
> On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote:
>> On 24/03/17 08:19, Chaotian Jing wrote:
>>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
>>> during switch commands")'
>>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
>>> then should do re-tune before the next CMD6 was sent.
>>>
>>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
>>> ---
>>>  drivers/mmc/core/mmc_ops.c |    3 +--
>>>  1 file changed, 1 insertion(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>>> index fe80f26..6931927 100644
>>> --- a/drivers/mmc/core/mmc_ops.c
>>> +++ b/drivers/mmc/core/mmc_ops.c
>>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>>>  	bool use_r1b_resp = use_busy_signal;
>>>  	unsigned char old_timing = host->ios.timing;
>>>  
>>> -	mmc_retune_hold(host);
>>> -
>>>  	/*
>>>  	 * If the cmd timeout and the max_busy_timeout of the host are both
>>>  	 * specified, let's validate them. A failure means we need to prevent
>>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>>>  		cmd.sanitize_busy = true;
>>>  
>>>  	err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>>> +	mmc_retune_hold(host);
>>
>> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
>> it is here:
>>
>> 	https://marc.info/?l=linux-mmc&m=148940903816582
>>
>> But using "retries" with commands that have busy-waiting on the data line
>> doesn't make much sense anyway.  Particularly with CRC errors, I would
>> expect the card is actually busily doing the switch and we need only to wait
>> for it.  The same can be true for timeout errors.  For some CMD6 we might
>> need to send CMD12 if the card is busy after an error.  I would prefer an
>> explicit attempt at recovery from CMD6 errors.
>>
> 
> It's the host driver's responsibility to ensure card is not in busy
> state before issue the next R1B command, or the MMC core layer needs do
> extra check/waiting before issue a R1B command.

Better to deal with cards stuck in busy from the places where busy-waiting
is expected.

> I think the purpose of "re-tune" is trying to cover particular case(eg.
> voltage fluctuate or EMI or some glitch of host/device which caused CRC
> error)

No, re-tuning is to compensate for drift caused primarily by temperature change.

> error) , but in such cases, too many cases are disable re-tune function
> by mmc_retune_hold(), for example, in this case, if a response CRC error
> got then we never have chance to recover it. then cause system cannot
> access emmc or suspend/resume fail.

Maybe you have a hardware problem.

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 24, 2017, 9:40 a.m. UTC | #4

On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote:
> On 24/03/17 10:32, Chaotian Jing wrote:
> > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote:
> >> On 24/03/17 08:19, Chaotian Jing wrote:
> >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
> >>> during switch commands")'
> >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
> >>> then should do re-tune before the next CMD6 was sent.
> >>>
> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
> >>> ---
> >>>  drivers/mmc/core/mmc_ops.c |    3 +--
> >>>  1 file changed, 1 insertion(+), 2 deletions(-)
> >>>
> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >>> index fe80f26..6931927 100644
> >>> --- a/drivers/mmc/core/mmc_ops.c
> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >>>  	bool use_r1b_resp = use_busy_signal;
> >>>  	unsigned char old_timing = host->ios.timing;
> >>>  
> >>> -	mmc_retune_hold(host);
> >>> -
> >>>  	/*
> >>>  	 * If the cmd timeout and the max_busy_timeout of the host are both
> >>>  	 * specified, let's validate them. A failure means we need to prevent
> >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >>>  		cmd.sanitize_busy = true;
> >>>  
> >>>  	err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> >>> +	mmc_retune_hold(host);
> >>
> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
> >> it is here:
> >>
> >> 	https://marc.info/?l=linux-mmc&m=148940903816582
> >>
> >> But using "retries" with commands that have busy-waiting on the data line
> >> doesn't make much sense anyway.  Particularly with CRC errors, I would
> >> expect the card is actually busily doing the switch and we need only to wait
> >> for it.  The same can be true for timeout errors.  For some CMD6 we might
> >> need to send CMD12 if the card is busy after an error.  I would prefer an
> >> explicit attempt at recovery from CMD6 errors.
> >>
> > 
> > It's the host driver's responsibility to ensure card is not in busy
> > state before issue the next R1B command, or the MMC core layer needs do
> > extra check/waiting before issue a R1B command.
> 
> Better to deal with cards stuck in busy from the places where busy-waiting
> is expected.
> 
Yes, if a R1B command got response CRC error, we can do busy-waiting in
the error hander funtion(mmc_wait_for_req_done())
> > I think the purpose of "re-tune" is trying to cover particular case(eg.
> > voltage fluctuate or EMI or some glitch of host/device which caused CRC
> > error)
> 
> No, re-tuning is to compensate for drift caused primarily by temperature change.
> 
Yes, by JEDEC spec, temperature change cause timing drift of EMMC
device, but, as you mentioned, maybe I have a hardware problem of host,
but needs Software to cover it. so that we are doing our best to do
re-tune if got CRC error. if could recover it, then  it's better than
system hung.
> > error) , but in such cases, too many cases are disable re-tune function
> > by mmc_retune_hold(), for example, in this case, if a response CRC error
> > got then we never have chance to recover it. then cause system cannot
> > access emmc or suspend/resume fail.
> 
> Maybe you have a hardware problem.
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ulf Hansson March 24, 2017, 10:49 a.m. UTC | #5

On 24 March 2017 at 10:40, Chaotian Jing <chaotian.jing@mediatek.com> wrote:
> On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote:
>> On 24/03/17 10:32, Chaotian Jing wrote:
>> > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote:
>> >> On 24/03/17 08:19, Chaotian Jing wrote:
>> >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
>> >>> during switch commands")'
>> >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
>> >>> then should do re-tune before the next CMD6 was sent.
>> >>>
>> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
>> >>> ---
>> >>>  drivers/mmc/core/mmc_ops.c |    3 +--
>> >>>  1 file changed, 1 insertion(+), 2 deletions(-)
>> >>>
>> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
>> >>> index fe80f26..6931927 100644
>> >>> --- a/drivers/mmc/core/mmc_ops.c
>> >>> +++ b/drivers/mmc/core/mmc_ops.c
>> >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>> >>>   bool use_r1b_resp = use_busy_signal;
>> >>>   unsigned char old_timing = host->ios.timing;
>> >>>
>> >>> - mmc_retune_hold(host);
>> >>> -
>> >>>   /*
>> >>>    * If the cmd timeout and the max_busy_timeout of the host are both
>> >>>    * specified, let's validate them. A failure means we need to prevent
>> >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
>> >>>           cmd.sanitize_busy = true;
>> >>>
>> >>>   err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
>> >>> + mmc_retune_hold(host);
>> >>
>> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
>> >> it is here:
>> >>
>> >>    https://marc.info/?l=linux-mmc&m=148940903816582
>> >>
>> >> But using "retries" with commands that have busy-waiting on the data line
>> >> doesn't make much sense anyway.  Particularly with CRC errors, I would
>> >> expect the card is actually busily doing the switch and we need only to wait
>> >> for it.  The same can be true for timeout errors.  For some CMD6 we might
>> >> need to send CMD12 if the card is busy after an error.  I would prefer an
>> >> explicit attempt at recovery from CMD6 errors.
>> >>
>> >
>> > It's the host driver's responsibility to ensure card is not in busy
>> > state before issue the next R1B command, or the MMC core layer needs do
>> > extra check/waiting before issue a R1B command.
>>
>> Better to deal with cards stuck in busy from the places where busy-waiting
>> is expected.
>>
> Yes, if a R1B command got response CRC error, we can do busy-waiting in
> the error hander funtion(mmc_wait_for_req_done())

That would introduce too big changes and I am sure it will cause/hide
other problems.

If there is a problem in __mmc_switch(), let's try to fix it there first.

>> > I think the purpose of "re-tune" is trying to cover particular case(eg.
>> > voltage fluctuate or EMI or some glitch of host/device which caused CRC
>> > error)
>>
>> No, re-tuning is to compensate for drift caused primarily by temperature change.
>>
> Yes, by JEDEC spec, temperature change cause timing drift of EMMC
> device, but, as you mentioned, maybe I have a hardware problem of host,
> but needs Software to cover it. so that we are doing our best to do
> re-tune if got CRC error. if could recover it, then  it's better than
> system hung.

Exactly in what cases do you get CRC errors for CMD6. We need a full
cmd log to understand and to help.

>> > error) , but in such cases, too many cases are disable re-tune function
>> > by mmc_retune_hold(), for example, in this case, if a response CRC error
>> > got then we never have chance to recover it. then cause system cannot
>> > access emmc or suspend/resume fail.
>>
>> Maybe you have a hardware problem.

There is no way I am going to accept patches touching this part of the
mmc core, without providing real evidence for how it solves a problem.
To me, it seems like you are applying a workaround for another issue.

Again, try to provide us with some more data and logs, then perhaps we
can help narrow down the issues.

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 27, 2017, 1:35 a.m. UTC | #6

On Fri, 2017-03-24 at 11:49 +0100, Ulf Hansson wrote:
> On 24 March 2017 at 10:40, Chaotian Jing <chaotian.jing@mediatek.com> wrote:
> > On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote:
> >> On 24/03/17 10:32, Chaotian Jing wrote:
> >> > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote:
> >> >> On 24/03/17 08:19, Chaotian Jing wrote:
> >> >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning
> >> >>> during switch commands")'
> >> >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error,
> >> >>> then should do re-tune before the next CMD6 was sent.
> >> >>>
> >> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com>
> >> >>> ---
> >> >>>  drivers/mmc/core/mmc_ops.c |    3 +--
> >> >>>  1 file changed, 1 insertion(+), 2 deletions(-)
> >> >>>
> >> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c
> >> >>> index fe80f26..6931927 100644
> >> >>> --- a/drivers/mmc/core/mmc_ops.c
> >> >>> +++ b/drivers/mmc/core/mmc_ops.c
> >> >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >> >>>   bool use_r1b_resp = use_busy_signal;
> >> >>>   unsigned char old_timing = host->ios.timing;
> >> >>>
> >> >>> - mmc_retune_hold(host);
> >> >>> -
> >> >>>   /*
> >> >>>    * If the cmd timeout and the max_busy_timeout of the host are both
> >> >>>    * specified, let's validate them. A failure means we need to prevent
> >> >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value,
> >> >>>           cmd.sanitize_busy = true;
> >> >>>
> >> >>>   err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES);
> >> >>> + mmc_retune_hold(host);
> >> >>
> >> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as
> >> >> it is here:
> >> >>
> >> >>    https://marc.info/?l=linux-mmc&m=148940903816582
> >> >>
> >> >> But using "retries" with commands that have busy-waiting on the data line
> >> >> doesn't make much sense anyway.  Particularly with CRC errors, I would
> >> >> expect the card is actually busily doing the switch and we need only to wait
> >> >> for it.  The same can be true for timeout errors.  For some CMD6 we might
> >> >> need to send CMD12 if the card is busy after an error.  I would prefer an
> >> >> explicit attempt at recovery from CMD6 errors.
> >> >>
> >> >
> >> > It's the host driver's responsibility to ensure card is not in busy
> >> > state before issue the next R1B command, or the MMC core layer needs do
> >> > extra check/waiting before issue a R1B command.
> >>
> >> Better to deal with cards stuck in busy from the places where busy-waiting
> >> is expected.
> >>
> > Yes, if a R1B command got response CRC error, we can do busy-waiting in
> > the error hander funtion(mmc_wait_for_req_done())
> 
> That would introduce too big changes and I am sure it will cause/hide
> other problems.
> 
> If there is a problem in __mmc_switch(), let's try to fix it there first.
> 
Anyway, it is a bug of retry 3 times at max but without check current
card status and ensure it's in transfer state before next retry.
> >> > I think the purpose of "re-tune" is trying to cover particular case(eg.
> >> > voltage fluctuate or EMI or some glitch of host/device which caused CRC
> >> > error)
> >>
> >> No, re-tuning is to compensate for drift caused primarily by temperature change.
> >>
> > Yes, by JEDEC spec, temperature change cause timing drift of EMMC
> > device, but, as you mentioned, maybe I have a hardware problem of host,
> > but needs Software to cover it. so that we are doing our best to do
> > re-tune if got CRC error. if could recover it, then  it's better than
> > system hung.
> 
> Exactly in what cases do you get CRC errors for CMD6. We need a full
> cmd log to understand and to help.
> 
> >> > error) , but in such cases, too many cases are disable re-tune function
> >> > by mmc_retune_hold(), for example, in this case, if a response CRC error
> >> > got then we never have chance to recover it. then cause system cannot
> >> > access emmc or suspend/resume fail.
> >>
> >> Maybe you have a hardware problem.
> 
> There is no way I am going to accept patches touching this part of the
> mmc core, without providing real evidence for how it solves a problem.
> To me, it seems like you are applying a workaround for another issue.
> 
> Again, try to provide us with some more data and logs, then perhaps we
> can help narrow down the issues.
> 
> Kind regards
> Uffe

Below is the fail log of suspend fail.
the normal command tune result should be 0xffffff9ff, but some time, we
get the tune result of 0xffffffff, then we choose the 10 as the best
tune parameter, which is not stable.
I know that we should focus on why we get the result of 0xffffffff, this
may be result of device/host timing shifting while tuning. but what I
want to do is that when get a response CRC error, we can do re-tune to
recovery it, but not only return the -84 and cause suspend fail
eventually. if all hardware are perfect, then we don't need the re-tune
mechanism.

as Adrian's comment, if temperature change at here caused CMD6 response
CRC error, then how to recovery it ?

[  129.106622]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
[map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is
stable
[  129.109404]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
[map:ffffe03f] [maxlen:19] [final:22]
--------------------> below is next resume and re-init card:
[  129.778454]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set
error -22: 3300000 - 3300000
[  130.016987]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
[map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10
is not stable.
[  130.019556]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
[map:ffffc03f] [maxlen:18] [final:23]
[  130.124279]  (1)[1248:system_server]mmc0: cache flush error -84 
[  130.125058]  (1)[1248:system_server]dpm_run_callback():
mmc_bus_suspend+0x0/0x4c returns -84 
[  130.126104]  (1)[1248:system_server]PM: Device mmc0:0001 failed to
suspend: error -84 



--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ulf Hansson March 28, 2017, 8:30 a.m. UTC | #7

[...]

>>
>> If there is a problem in __mmc_switch(), let's try to fix it there first.
>>
> Anyway, it is a bug of retry 3 times at max but without check current
> card status and ensure it's in transfer state before next retry.

Correct. Do you want to send a patch that fixes this? Otherwise I can do it...

>> >> > I think the purpose of "re-tune" is trying to cover particular case(eg.
>> >> > voltage fluctuate or EMI or some glitch of host/device which caused CRC
>> >> > error)
>> >>
>> >> No, re-tuning is to compensate for drift caused primarily by temperature change.
>> >>
>> > Yes, by JEDEC spec, temperature change cause timing drift of EMMC
>> > device, but, as you mentioned, maybe I have a hardware problem of host,
>> > but needs Software to cover it. so that we are doing our best to do
>> > re-tune if got CRC error. if could recover it, then  it's better than
>> > system hung.
>>
>> Exactly in what cases do you get CRC errors for CMD6. We need a full
>> cmd log to understand and to help.
>>
>> >> > error) , but in such cases, too many cases are disable re-tune function
>> >> > by mmc_retune_hold(), for example, in this case, if a response CRC error
>> >> > got then we never have chance to recover it. then cause system cannot
>> >> > access emmc or suspend/resume fail.
>> >>
>> >> Maybe you have a hardware problem.
>>
>> There is no way I am going to accept patches touching this part of the
>> mmc core, without providing real evidence for how it solves a problem.
>> To me, it seems like you are applying a workaround for another issue.
>>
>> Again, try to provide us with some more data and logs, then perhaps we
>> can help narrow down the issues.
>>
>> Kind regards
>> Uffe
>
> Below is the fail log of suspend fail.
> the normal command tune result should be 0xffffff9ff, but some time, we
> get the tune result of 0xffffffff, then we choose the 10 as the best
> tune parameter, which is not stable.
> I know that we should focus on why we get the result of 0xffffffff, this
> may be result of device/host timing shifting while tuning. but what I
> want to do is that when get a response CRC error, we can do re-tune to
> recovery it, but not only return the -84 and cause suspend fail
> eventually. if all hardware are perfect, then we don't need the re-tune
> mechanism.

Thanks for elaborating!

Can you please also tell exactly which of the CMD6 commands in the
suspend sequence that is triggering this problem? Cache flush? Power
off notification?

>
> as Adrian's comment, if temperature change at here caused CMD6 response
> CRC error, then how to recovery it ?

So in your case, allowing re-tuning a little longer in __mmc_switch()
solves your problem. Clearly there are cases when we need to prevent
re-tuning when sending CMD6, however maybe not in all cases as we do
today.

For example it seems reasonable to not hold retuning before sending
CMD6 for cache flush, but instead it should be sufficient to hold it
before polling for busy in __mmc_switch().

Adrian, what's your thoughts on this?

>
> [  129.106622]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is
> stable
> [  129.109404]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> [map:ffffe03f] [maxlen:19] [final:22]
> --------------------> below is next resume and re-init card:
> [  129.778454]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set
> error -22: 3300000 - 3300000
> [  130.016987]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10
> is not stable.

As you suspect the tuning didn't work out correctly, then why don't
you retry one more time?

> [  130.019556]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> [map:ffffc03f] [maxlen:18] [final:23]
> [  130.124279]  (1)[1248:system_server]mmc0: cache flush error -84
> [  130.125058]  (1)[1248:system_server]dpm_run_callback():
> mmc_bus_suspend+0x0/0x4c returns -84
> [  130.126104]  (1)[1248:system_server]PM: Device mmc0:0001 failed to
> suspend: error -84
>
>
>

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Adrian Hunter March 28, 2017, 9:01 a.m. UTC | #8

On 28/03/17 11:30, Ulf Hansson wrote:
> [...]
> 
>>>
>>> If there is a problem in __mmc_switch(), let's try to fix it there first.
>>>
>> Anyway, it is a bug of retry 3 times at max but without check current
>> card status and ensure it's in transfer state before next retry.
> 
> Correct. Do you want to send a patch that fixes this? Otherwise I can do it...
> 
>>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg.
>>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC
>>>>>> error)
>>>>>
>>>>> No, re-tuning is to compensate for drift caused primarily by temperature change.
>>>>>
>>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC
>>>> device, but, as you mentioned, maybe I have a hardware problem of host,
>>>> but needs Software to cover it. so that we are doing our best to do
>>>> re-tune if got CRC error. if could recover it, then  it's better than
>>>> system hung.
>>>
>>> Exactly in what cases do you get CRC errors for CMD6. We need a full
>>> cmd log to understand and to help.
>>>
>>>>>> error) , but in such cases, too many cases are disable re-tune function
>>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error
>>>>>> got then we never have chance to recover it. then cause system cannot
>>>>>> access emmc or suspend/resume fail.
>>>>>
>>>>> Maybe you have a hardware problem.
>>>
>>> There is no way I am going to accept patches touching this part of the
>>> mmc core, without providing real evidence for how it solves a problem.
>>> To me, it seems like you are applying a workaround for another issue.
>>>
>>> Again, try to provide us with some more data and logs, then perhaps we
>>> can help narrow down the issues.
>>>
>>> Kind regards
>>> Uffe
>>
>> Below is the fail log of suspend fail.
>> the normal command tune result should be 0xffffff9ff, but some time, we
>> get the tune result of 0xffffffff, then we choose the 10 as the best
>> tune parameter, which is not stable.
>> I know that we should focus on why we get the result of 0xffffffff, this
>> may be result of device/host timing shifting while tuning. but what I
>> want to do is that when get a response CRC error, we can do re-tune to
>> recovery it, but not only return the -84 and cause suspend fail
>> eventually. if all hardware are perfect, then we don't need the re-tune
>> mechanism.
> 
> Thanks for elaborating!
> 
> Can you please also tell exactly which of the CMD6 commands in the
> suspend sequence that is triggering this problem? Cache flush? Power
> off notification?
> 
>>
>> as Adrian's comment, if temperature change at here caused CMD6 response
>> CRC error, then how to recovery it ?
> 
> So in your case, allowing re-tuning a little longer in __mmc_switch()
> solves your problem. Clearly there are cases when we need to prevent
> re-tuning when sending CMD6, however maybe not in all cases as we do
> today.
> 
> For example it seems reasonable to not hold retuning before sending
> CMD6 for cache flush, but instead it should be sufficient to hold it
> before polling for busy in __mmc_switch().
> 
> Adrian, what's your thoughts on this?

mmc_retune_hold() and mmc_retune_release() are designed to go around a group
of commands, but re-tuning can still be done before the first command. i.e.

	mmc_retune_hold
	<re-tune can happen here>
	cmd A
	<re-tune not allowed here>
	cmd B
	<re-tune not allowed here>
	cmd C
	mmc_retune_release

That is the same in the retry case:

	mmc_retune_hold
	<re-tune can happen here>
	cmd A
	<re-tune not allowed here>
	retry cmd A
	<re-tune not allowed here>
	cmd B
	<re-tune not allowed here>
	cmd C
	mmc_retune_release

The retry mechanism provided by mmc_wait_for_cmd() and friends really only
makes sense for simple commands.  In other cases, like this, we need to
consider what state the card is in.  For __mmc_switch we need to consider
whether the card is busy or whether a timing change been made.

> 
>>
>> [  129.106622]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
>> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is
>> stable
>> [  129.109404]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
>> [map:ffffe03f] [maxlen:19] [final:22]
>> --------------------> below is next resume and re-init card:
>> [  129.778454]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set
>> error -22: 3300000 - 3300000
>> [  130.016987]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
>> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10
>> is not stable.
> 
> As you suspect the tuning didn't work out correctly, then why don't
> you retry one more time?

Or restore the previously known good result?

> 
>> [  130.019556]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
>> [map:ffffc03f] [maxlen:18] [final:23]
>> [  130.124279]  (1)[1248:system_server]mmc0: cache flush error -84
>> [  130.125058]  (1)[1248:system_server]dpm_run_callback():
>> mmc_bus_suspend+0x0/0x4c returns -84
>> [  130.126104]  (1)[1248:system_server]PM: Device mmc0:0001 failed to
>> suspend: error -84
>>
>>
>>
> 
> Kind regards
> Uffe
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ulf Hansson March 28, 2017, 9:58 a.m. UTC | #9

On 28 March 2017 at 11:01, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 28/03/17 11:30, Ulf Hansson wrote:
>> [...]
>>
>>>>
>>>> If there is a problem in __mmc_switch(), let's try to fix it there first.
>>>>
>>> Anyway, it is a bug of retry 3 times at max but without check current
>>> card status and ensure it's in transfer state before next retry.
>>
>> Correct. Do you want to send a patch that fixes this? Otherwise I can do it...
>>
>>>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg.
>>>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC
>>>>>>> error)
>>>>>>
>>>>>> No, re-tuning is to compensate for drift caused primarily by temperature change.
>>>>>>
>>>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC
>>>>> device, but, as you mentioned, maybe I have a hardware problem of host,
>>>>> but needs Software to cover it. so that we are doing our best to do
>>>>> re-tune if got CRC error. if could recover it, then  it's better than
>>>>> system hung.
>>>>
>>>> Exactly in what cases do you get CRC errors for CMD6. We need a full
>>>> cmd log to understand and to help.
>>>>
>>>>>>> error) , but in such cases, too many cases are disable re-tune function
>>>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error
>>>>>>> got then we never have chance to recover it. then cause system cannot
>>>>>>> access emmc or suspend/resume fail.
>>>>>>
>>>>>> Maybe you have a hardware problem.
>>>>
>>>> There is no way I am going to accept patches touching this part of the
>>>> mmc core, without providing real evidence for how it solves a problem.
>>>> To me, it seems like you are applying a workaround for another issue.
>>>>
>>>> Again, try to provide us with some more data and logs, then perhaps we
>>>> can help narrow down the issues.
>>>>
>>>> Kind regards
>>>> Uffe
>>>
>>> Below is the fail log of suspend fail.
>>> the normal command tune result should be 0xffffff9ff, but some time, we
>>> get the tune result of 0xffffffff, then we choose the 10 as the best
>>> tune parameter, which is not stable.
>>> I know that we should focus on why we get the result of 0xffffffff, this
>>> may be result of device/host timing shifting while tuning. but what I
>>> want to do is that when get a response CRC error, we can do re-tune to
>>> recovery it, but not only return the -84 and cause suspend fail
>>> eventually. if all hardware are perfect, then we don't need the re-tune
>>> mechanism.
>>
>> Thanks for elaborating!
>>
>> Can you please also tell exactly which of the CMD6 commands in the
>> suspend sequence that is triggering this problem? Cache flush? Power
>> off notification?
>>
>>>
>>> as Adrian's comment, if temperature change at here caused CMD6 response
>>> CRC error, then how to recovery it ?
>>
>> So in your case, allowing re-tuning a little longer in __mmc_switch()
>> solves your problem. Clearly there are cases when we need to prevent
>> re-tuning when sending CMD6, however maybe not in all cases as we do
>> today.
>>
>> For example it seems reasonable to not hold retuning before sending
>> CMD6 for cache flush, but instead it should be sufficient to hold it
>> before polling for busy in __mmc_switch().
>>
>> Adrian, what's your thoughts on this?
>
> mmc_retune_hold() and mmc_retune_release() are designed to go around a group
> of commands, but re-tuning can still be done before the first command. i.e.
>
>         mmc_retune_hold
>         <re-tune can happen here>
>         cmd A
>         <re-tune not allowed here>
>         cmd B
>         <re-tune not allowed here>
>         cmd C
>         mmc_retune_release
>
> That is the same in the retry case
>
>         mmc_retune_hold
>         <re-tune can happen here>
>         cmd A
>         <re-tune not allowed here>
>         retry cmd A
>         <re-tune not allowed here>
>         cmd B
>         <re-tune not allowed here>
>         cmd C
>         mmc_retune_release

Ohh, right! Thanks for the detailed description.

>
> The retry mechanism provided by mmc_wait_for_cmd() and friends really only
> makes sense for simple commands.  In other cases, like this, we need to
> consider what state the card is in.  For __mmc_switch we need to consider
> whether the card is busy or whether a timing change been made.

I definitely agree. We should remove retries for CMD6 and perhaps also
for some other cases.

When we have changed the above in __mmc_switch(), the change Chaotian
suggest gets a different impact, as it would potentially allow a
re-tuning to happen before the next CMD1to poll for busy or to check
the switch status. This isn't okay.

This all sounds to me that Chaotian's issue may not all be related to
tuning, but to the CMD6 switch sequence itself. However I may be wrong
- of course. :-)

[...]

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 28, 2017, 9:59 a.m. UTC | #10

On Tue, 2017-03-28 at 12:01 +0300, Adrian Hunter wrote:
> On 28/03/17 11:30, Ulf Hansson wrote:
> > [...]
> > 
> >>>
> >>> If there is a problem in __mmc_switch(), let's try to fix it there first.
> >>>
> >> Anyway, it is a bug of retry 3 times at max but without check current
> >> card status and ensure it's in transfer state before next retry.
> > 
> > Correct. Do you want to send a patch that fixes this? Otherwise I can do it...
> > 
Actually, it may hard to check card status, if host supports
MMC_CAP_WAIT_WHILE_BUSY or mmc->card_busy(), it will easy to know
current card status, or we must issue CMD13 to do polling, but as you
know, CMD13 may also gets response CRC error.
> >>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg.
> >>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC
> >>>>>> error)
> >>>>>
> >>>>> No, re-tuning is to compensate for drift caused primarily by temperature change.
> >>>>>
> >>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC
> >>>> device, but, as you mentioned, maybe I have a hardware problem of host,
> >>>> but needs Software to cover it. so that we are doing our best to do
> >>>> re-tune if got CRC error. if could recover it, then  it's better than
> >>>> system hung.
> >>>
> >>> Exactly in what cases do you get CRC errors for CMD6. We need a full
> >>> cmd log to understand and to help.
> >>>
> >>>>>> error) , but in such cases, too many cases are disable re-tune function
> >>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error
> >>>>>> got then we never have chance to recover it. then cause system cannot
> >>>>>> access emmc or suspend/resume fail.
> >>>>>
> >>>>> Maybe you have a hardware problem.
> >>>
> >>> There is no way I am going to accept patches touching this part of the
> >>> mmc core, without providing real evidence for how it solves a problem.
> >>> To me, it seems like you are applying a workaround for another issue.
> >>>
> >>> Again, try to provide us with some more data and logs, then perhaps we
> >>> can help narrow down the issues.
> >>>
> >>> Kind regards
> >>> Uffe
> >>
> >> Below is the fail log of suspend fail.
> >> the normal command tune result should be 0xffffff9ff, but some time, we
> >> get the tune result of 0xffffffff, then we choose the 10 as the best
> >> tune parameter, which is not stable.
> >> I know that we should focus on why we get the result of 0xffffffff, this
> >> may be result of device/host timing shifting while tuning. but what I
> >> want to do is that when get a response CRC error, we can do re-tune to
> >> recovery it, but not only return the -84 and cause suspend fail
> >> eventually. if all hardware are perfect, then we don't need the re-tune
> >> mechanism.
> > 
> > Thanks for elaborating!
> > 
> > Can you please also tell exactly which of the CMD6 commands in the
> > suspend sequence that is triggering this problem? Cache flush? Power
> > off notification?
> > 
> >>
> >> as Adrian's comment, if temperature change at here caused CMD6 response
> >> CRC error, then how to recovery it ?
> > 
> > So in your case, allowing re-tuning a little longer in __mmc_switch()
> > solves your problem. Clearly there are cases when we need to prevent
> > re-tuning when sending CMD6, however maybe not in all cases as we do
> > today.
> > 
> > For example it seems reasonable to not hold retuning before sending
> > CMD6 for cache flush, but instead it should be sufficient to hold it
> > before polling for busy in __mmc_switch().
> > 
> > Adrian, what's your thoughts on this?
> 
> mmc_retune_hold() and mmc_retune_release() are designed to go around a group
> of commands, but re-tuning can still be done before the first command. i.e.
> 
> 	mmc_retune_hold
> 	<re-tune can happen here>
> 	cmd A
> 	<re-tune not allowed here>
> 	cmd B
> 	<re-tune not allowed here>
> 	cmd C
> 	mmc_retune_release
> 
> That is the same in the retry case:
> 
> 	mmc_retune_hold
> 	<re-tune can happen here>
> 	cmd A
> 	<re-tune not allowed here>
> 	retry cmd A
> 	<re-tune not allowed here>
> 	cmd B
> 	<re-tune not allowed here>
> 	cmd C
> 	mmc_retune_release

Thanks for explain the mechanism of mmc_retune_hold()
> 
> The retry mechanism provided by mmc_wait_for_cmd() and friends really only
> makes sense for simple commands.  In other cases, like this, we need to
> consider what state the card is in.  For __mmc_switch we need to consider
> whether the card is busy or whether a timing change been made.
> 
Yes, different R1B command need different handle, like CMD12, if card is
not in send/recv state, retry of CMD12 will get timeout.
> > 
> >>
> >> [  129.106622]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> >> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is
> >> stable
> >> [  129.109404]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> >> [map:ffffe03f] [maxlen:19] [final:22]
> >> --------------------> below is next resume and re-init card:
> >> [  129.778454]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set
> >> error -22: 3300000 - 3300000
> >> [  130.016987]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> >> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10
> >> is not stable.
> > 
> > As you suspect the tuning didn't work out correctly, then why don't
> > you retry one more time?
> 
> Or restore the previously known good result?
> 
At runtime, we don't know if current tune result is OK or unstable.
when analysis the fail log, then find the difference between OK and NG
result.
> > 
> >> [  130.019556]  (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase:
> >> [map:ffffc03f] [maxlen:18] [final:23]
> >> [  130.124279]  (1)[1248:system_server]mmc0: cache flush error -84
> >> [  130.125058]  (1)[1248:system_server]dpm_run_callback():
> >> mmc_bus_suspend+0x0/0x4c returns -84
> >> [  130.126104]  (1)[1248:system_server]PM: Device mmc0:0001 failed to
> >> suspend: error -84
> >>
> >>
> >>
> > 
> > Kind regards
> > Uffe
> > 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ulf Hansson March 28, 2017, 9:59 a.m. UTC | #11

[...]

>> The retry mechanism provided by mmc_wait_for_cmd() and friends really only
>> makes sense for simple commands.  In other cases, like this, we need to
>> consider what state the card is in.  For __mmc_switch we need to consider
>> whether the card is busy or whether a timing change been made.
>
> I definitely agree. We should remove retries for CMD6 and perhaps also
> for some other cases.
>
> When we have changed the above in __mmc_switch(), the change Chaotian
> suggest gets a different impact, as it would potentially allow a
> re-tuning to happen before the next CMD1to poll for busy or to check

/s/CMD1/CMD13

> the switch status. This isn't okay.
>
> This all sounds to me that Chaotian's issue may not all be related to
> tuning, but to the CMD6 switch sequence itself. However I may be wrong
> - of course. :-)
>
> [...]
>
> Kind regards
> Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 29, 2017, 2:27 a.m. UTC | #12

On Tue, 2017-03-28 at 11:59 +0200, Ulf Hansson wrote:
> [...]
> 
> >> The retry mechanism provided by mmc_wait_for_cmd() and friends really only
> >> makes sense for simple commands.  In other cases, like this, we need to
> >> consider what state the card is in.  For __mmc_switch we need to consider
> >> whether the card is busy or whether a timing change been made.
> >
> > I definitely agree. We should remove retries for CMD6 and perhaps also
> > for some other cases.
> >
just only remove retries ? how to do error handle of CMD6 response CRC
error ?
> > When we have changed the above in __mmc_switch(), the change Chaotian
> > suggest gets a different impact, as it would potentially allow a
> > re-tuning to happen before the next CMD1to poll for busy or to check
> 
> /s/CMD1/CMD13
> 
> > the switch status. This isn't okay.
> >
> > This all sounds to me that Chaotian's issue may not all be related to
> > tuning, but to the CMD6 switch sequence itself. However I may be wrong
> > - of course. :-)
We had already noticed this issue and do busy check before CMD6 was sent
in our host driver. so that for me, the rest work is to do re-tune
before the next cmd6...
> >
> > [...]
> >
> > Kind regards
> > Uffe


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Ulf Hansson March 29, 2017, 10:33 a.m. UTC | #13

[...]

>> Below is the fail log of suspend fail.
>> the normal command tune result should be 0xffffff9ff, but some time, we
>> get the tune result of 0xffffffff, then we choose the 10 as the best
>> tune parameter, which is not stable.
>> I know that we should focus on why we get the result of 0xffffffff, this
>> may be result of device/host timing shifting while tuning. but what I
>> want to do is that when get a response CRC error, we can do re-tune to
>> recovery it, but not only return the -84 and cause suspend fail
>> eventually. if all hardware are perfect, then we don't need the re-tune
>> mechanism.
>
> Thanks for elaborating!
>
> Can you please also tell exactly which of the CMD6 commands in the
> suspend sequence that is triggering this problem? Cache flush? Power
> off notification?

You didn't answer this question. I would really like to know the
sequence of the commands you see that are being sent to the card
during suspend. And of course in particular what command that fails.

[...]

Kind regards
Uffe
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Chaotian Jing March 29, 2017, 10:36 a.m. UTC | #14

On Wed, 2017-03-29 at 12:33 +0200, Ulf Hansson wrote:
> [...]
> 
> >> Below is the fail log of suspend fail.
> >> the normal command tune result should be 0xffffff9ff, but some time, we
> >> get the tune result of 0xffffffff, then we choose the 10 as the best
> >> tune parameter, which is not stable.
> >> I know that we should focus on why we get the result of 0xffffffff, this
> >> may be result of device/host timing shifting while tuning. but what I
> >> want to do is that when get a response CRC error, we can do re-tune to
> >> recovery it, but not only return the -84 and cause suspend fail
> >> eventually. if all hardware are perfect, then we don't need the re-tune
> >> mechanism.
> >
> > Thanks for elaborating!
> >
> > Can you please also tell exactly which of the CMD6 commands in the
> > suspend sequence that is triggering this problem? Cache flush? Power
> > off notification?
> 
> You didn't answer this question. I would really like to know the
> sequence of the commands you see that are being sent to the card
> during suspend. And of course in particular what command that fails.
> 
Sorry, it's cache flush.
I assume that you have noticed the fail log(it shows that cache flush
error)
> [...]
> 
> Kind regards
> Uffe


--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

mmc: core: Do not hold re-tuning during CMD6 commands

Commit Message

Comments

Patch