Message ID | 1490336341-22292-2-git-send-email-chaotian.jing@mediatek.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 24/03/17 08:19, Chaotian Jing wrote: > this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning > during switch commands")' > Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, > then should do re-tune before the next CMD6 was sent. > > Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> > --- > drivers/mmc/core/mmc_ops.c | 3 +-- > 1 file changed, 1 insertion(+), 2 deletions(-) > > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c > index fe80f26..6931927 100644 > --- a/drivers/mmc/core/mmc_ops.c > +++ b/drivers/mmc/core/mmc_ops.c > @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > bool use_r1b_resp = use_busy_signal; > unsigned char old_timing = host->ios.timing; > > - mmc_retune_hold(host); > - > /* > * If the cmd timeout and the max_busy_timeout of the host are both > * specified, let's validate them. A failure means we need to prevent > @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > cmd.sanitize_busy = true; > > err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); > + mmc_retune_hold(host); That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as it is here: https://marc.info/?l=linux-mmc&m=148940903816582 But using "retries" with commands that have busy-waiting on the data line doesn't make much sense anyway. Particularly with CRC errors, I would expect the card is actually busily doing the switch and we need only to wait for it. The same can be true for timeout errors. For some CMD6 we might need to send CMD12 if the card is busy after an error. I would prefer an explicit attempt at recovery from CMD6 errors.
On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote: > On 24/03/17 08:19, Chaotian Jing wrote: > > this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning > > during switch commands")' > > Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, > > then should do re-tune before the next CMD6 was sent. > > > > Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> > > --- > > drivers/mmc/core/mmc_ops.c | 3 +-- > > 1 file changed, 1 insertion(+), 2 deletions(-) > > > > diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c > > index fe80f26..6931927 100644 > > --- a/drivers/mmc/core/mmc_ops.c > > +++ b/drivers/mmc/core/mmc_ops.c > > @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > > bool use_r1b_resp = use_busy_signal; > > unsigned char old_timing = host->ios.timing; > > > > - mmc_retune_hold(host); > > - > > /* > > * If the cmd timeout and the max_busy_timeout of the host are both > > * specified, let's validate them. A failure means we need to prevent > > @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > > cmd.sanitize_busy = true; > > > > err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); > > + mmc_retune_hold(host); > > That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as > it is here: > > https://marc.info/?l=linux-mmc&m=148940903816582 > > But using "retries" with commands that have busy-waiting on the data line > doesn't make much sense anyway. Particularly with CRC errors, I would > expect the card is actually busily doing the switch and we need only to wait > for it. The same can be true for timeout errors. For some CMD6 we might > need to send CMD12 if the card is busy after an error. I would prefer an > explicit attempt at recovery from CMD6 errors. > It's the host driver's responsibility to ensure card is not in busy state before issue the next R1B command, or the MMC core layer needs do extra check/waiting before issue a R1B command. I think the purpose of "re-tune" is trying to cover particular case(eg. voltage fluctuate or EMI or some glitch of host/device which caused CRC error) , but in such cases, too many cases are disable re-tune function by mmc_retune_hold(), for example, in this case, if a response CRC error got then we never have chance to recover it. then cause system cannot access emmc or suspend/resume fail.
On 24/03/17 10:32, Chaotian Jing wrote: > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote: >> On 24/03/17 08:19, Chaotian Jing wrote: >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning >>> during switch commands")' >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, >>> then should do re-tune before the next CMD6 was sent. >>> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> >>> --- >>> drivers/mmc/core/mmc_ops.c | 3 +-- >>> 1 file changed, 1 insertion(+), 2 deletions(-) >>> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c >>> index fe80f26..6931927 100644 >>> --- a/drivers/mmc/core/mmc_ops.c >>> +++ b/drivers/mmc/core/mmc_ops.c >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, >>> bool use_r1b_resp = use_busy_signal; >>> unsigned char old_timing = host->ios.timing; >>> >>> - mmc_retune_hold(host); >>> - >>> /* >>> * If the cmd timeout and the max_busy_timeout of the host are both >>> * specified, let's validate them. A failure means we need to prevent >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, >>> cmd.sanitize_busy = true; >>> >>> err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); >>> + mmc_retune_hold(host); >> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as >> it is here: >> >> https://marc.info/?l=linux-mmc&m=148940903816582 >> >> But using "retries" with commands that have busy-waiting on the data line >> doesn't make much sense anyway. Particularly with CRC errors, I would >> expect the card is actually busily doing the switch and we need only to wait >> for it. The same can be true for timeout errors. For some CMD6 we might >> need to send CMD12 if the card is busy after an error. I would prefer an >> explicit attempt at recovery from CMD6 errors. >> > > It's the host driver's responsibility to ensure card is not in busy > state before issue the next R1B command, or the MMC core layer needs do > extra check/waiting before issue a R1B command. Better to deal with cards stuck in busy from the places where busy-waiting is expected. > I think the purpose of "re-tune" is trying to cover particular case(eg. > voltage fluctuate or EMI or some glitch of host/device which caused CRC > error) No, re-tuning is to compensate for drift caused primarily by temperature change. > error) , but in such cases, too many cases are disable re-tune function > by mmc_retune_hold(), for example, in this case, if a response CRC error > got then we never have chance to recover it. then cause system cannot > access emmc or suspend/resume fail. Maybe you have a hardware problem.
On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote: > On 24/03/17 10:32, Chaotian Jing wrote: > > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote: > >> On 24/03/17 08:19, Chaotian Jing wrote: > >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning > >>> during switch commands")' > >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, > >>> then should do re-tune before the next CMD6 was sent. > >>> > >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> > >>> --- > >>> drivers/mmc/core/mmc_ops.c | 3 +-- > >>> 1 file changed, 1 insertion(+), 2 deletions(-) > >>> > >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c > >>> index fe80f26..6931927 100644 > >>> --- a/drivers/mmc/core/mmc_ops.c > >>> +++ b/drivers/mmc/core/mmc_ops.c > >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > >>> bool use_r1b_resp = use_busy_signal; > >>> unsigned char old_timing = host->ios.timing; > >>> > >>> - mmc_retune_hold(host); > >>> - > >>> /* > >>> * If the cmd timeout and the max_busy_timeout of the host are both > >>> * specified, let's validate them. A failure means we need to prevent > >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > >>> cmd.sanitize_busy = true; > >>> > >>> err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); > >>> + mmc_retune_hold(host); > >> > >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as > >> it is here: > >> > >> https://marc.info/?l=linux-mmc&m=148940903816582 > >> > >> But using "retries" with commands that have busy-waiting on the data line > >> doesn't make much sense anyway. Particularly with CRC errors, I would > >> expect the card is actually busily doing the switch and we need only to wait > >> for it. The same can be true for timeout errors. For some CMD6 we might > >> need to send CMD12 if the card is busy after an error. I would prefer an > >> explicit attempt at recovery from CMD6 errors. > >> > > > > It's the host driver's responsibility to ensure card is not in busy > > state before issue the next R1B command, or the MMC core layer needs do > > extra check/waiting before issue a R1B command. > > Better to deal with cards stuck in busy from the places where busy-waiting > is expected. > Yes, if a R1B command got response CRC error, we can do busy-waiting in the error hander funtion(mmc_wait_for_req_done()) > > I think the purpose of "re-tune" is trying to cover particular case(eg. > > voltage fluctuate or EMI or some glitch of host/device which caused CRC > > error) > > No, re-tuning is to compensate for drift caused primarily by temperature change. > Yes, by JEDEC spec, temperature change cause timing drift of EMMC device, but, as you mentioned, maybe I have a hardware problem of host, but needs Software to cover it. so that we are doing our best to do re-tune if got CRC error. if could recover it, then it's better than system hung. > > error) , but in such cases, too many cases are disable re-tune function > > by mmc_retune_hold(), for example, in this case, if a response CRC error > > got then we never have chance to recover it. then cause system cannot > > access emmc or suspend/resume fail. > > Maybe you have a hardware problem. >
On 24 March 2017 at 10:40, Chaotian Jing <chaotian.jing@mediatek.com> wrote: > On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote: >> On 24/03/17 10:32, Chaotian Jing wrote: >> > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote: >> >> On 24/03/17 08:19, Chaotian Jing wrote: >> >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning >> >>> during switch commands")' >> >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, >> >>> then should do re-tune before the next CMD6 was sent. >> >>> >> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> >> >>> --- >> >>> drivers/mmc/core/mmc_ops.c | 3 +-- >> >>> 1 file changed, 1 insertion(+), 2 deletions(-) >> >>> >> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c >> >>> index fe80f26..6931927 100644 >> >>> --- a/drivers/mmc/core/mmc_ops.c >> >>> +++ b/drivers/mmc/core/mmc_ops.c >> >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, >> >>> bool use_r1b_resp = use_busy_signal; >> >>> unsigned char old_timing = host->ios.timing; >> >>> >> >>> - mmc_retune_hold(host); >> >>> - >> >>> /* >> >>> * If the cmd timeout and the max_busy_timeout of the host are both >> >>> * specified, let's validate them. A failure means we need to prevent >> >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, >> >>> cmd.sanitize_busy = true; >> >>> >> >>> err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); >> >>> + mmc_retune_hold(host); >> >> >> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as >> >> it is here: >> >> >> >> https://marc.info/?l=linux-mmc&m=148940903816582 >> >> >> >> But using "retries" with commands that have busy-waiting on the data line >> >> doesn't make much sense anyway. Particularly with CRC errors, I would >> >> expect the card is actually busily doing the switch and we need only to wait >> >> for it. The same can be true for timeout errors. For some CMD6 we might >> >> need to send CMD12 if the card is busy after an error. I would prefer an >> >> explicit attempt at recovery from CMD6 errors. >> >> >> > >> > It's the host driver's responsibility to ensure card is not in busy >> > state before issue the next R1B command, or the MMC core layer needs do >> > extra check/waiting before issue a R1B command. >> >> Better to deal with cards stuck in busy from the places where busy-waiting >> is expected. >> > Yes, if a R1B command got response CRC error, we can do busy-waiting in > the error hander funtion(mmc_wait_for_req_done()) That would introduce too big changes and I am sure it will cause/hide other problems. If there is a problem in __mmc_switch(), let's try to fix it there first. >> > I think the purpose of "re-tune" is trying to cover particular case(eg. >> > voltage fluctuate or EMI or some glitch of host/device which caused CRC >> > error) >> >> No, re-tuning is to compensate for drift caused primarily by temperature change. >> > Yes, by JEDEC spec, temperature change cause timing drift of EMMC > device, but, as you mentioned, maybe I have a hardware problem of host, > but needs Software to cover it. so that we are doing our best to do > re-tune if got CRC error. if could recover it, then it's better than > system hung. Exactly in what cases do you get CRC errors for CMD6. We need a full cmd log to understand and to help. >> > error) , but in such cases, too many cases are disable re-tune function >> > by mmc_retune_hold(), for example, in this case, if a response CRC error >> > got then we never have chance to recover it. then cause system cannot >> > access emmc or suspend/resume fail. >> >> Maybe you have a hardware problem. There is no way I am going to accept patches touching this part of the mmc core, without providing real evidence for how it solves a problem. To me, it seems like you are applying a workaround for another issue. Again, try to provide us with some more data and logs, then perhaps we can help narrow down the issues. Kind regards Uffe
On Fri, 2017-03-24 at 11:49 +0100, Ulf Hansson wrote: > On 24 March 2017 at 10:40, Chaotian Jing <chaotian.jing@mediatek.com> wrote: > > On Fri, 2017-03-24 at 11:19 +0200, Adrian Hunter wrote: > >> On 24/03/17 10:32, Chaotian Jing wrote: > >> > On Fri, 2017-03-24 at 09:52 +0200, Adrian Hunter wrote: > >> >> On 24/03/17 08:19, Chaotian Jing wrote: > >> >>> this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning > >> >>> during switch commands")' > >> >>> Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, > >> >>> then should do re-tune before the next CMD6 was sent. > >> >>> > >> >>> Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> > >> >>> --- > >> >>> drivers/mmc/core/mmc_ops.c | 3 +-- > >> >>> 1 file changed, 1 insertion(+), 2 deletions(-) > >> >>> > >> >>> diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c > >> >>> index fe80f26..6931927 100644 > >> >>> --- a/drivers/mmc/core/mmc_ops.c > >> >>> +++ b/drivers/mmc/core/mmc_ops.c > >> >>> @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > >> >>> bool use_r1b_resp = use_busy_signal; > >> >>> unsigned char old_timing = host->ios.timing; > >> >>> > >> >>> - mmc_retune_hold(host); > >> >>> - > >> >>> /* > >> >>> * If the cmd timeout and the max_busy_timeout of the host are both > >> >>> * specified, let's validate them. A failure means we need to prevent > >> >>> @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, > >> >>> cmd.sanitize_busy = true; > >> >>> > >> >>> err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); > >> >>> + mmc_retune_hold(host); > >> >> > >> >> That is not how mmc_retune_hold() works, you need mmc_retune_hold_now() as > >> >> it is here: > >> >> > >> >> https://marc.info/?l=linux-mmc&m=148940903816582 > >> >> > >> >> But using "retries" with commands that have busy-waiting on the data line > >> >> doesn't make much sense anyway. Particularly with CRC errors, I would > >> >> expect the card is actually busily doing the switch and we need only to wait > >> >> for it. The same can be true for timeout errors. For some CMD6 we might > >> >> need to send CMD12 if the card is busy after an error. I would prefer an > >> >> explicit attempt at recovery from CMD6 errors. > >> >> > >> > > >> > It's the host driver's responsibility to ensure card is not in busy > >> > state before issue the next R1B command, or the MMC core layer needs do > >> > extra check/waiting before issue a R1B command. > >> > >> Better to deal with cards stuck in busy from the places where busy-waiting > >> is expected. > >> > > Yes, if a R1B command got response CRC error, we can do busy-waiting in > > the error hander funtion(mmc_wait_for_req_done()) > > That would introduce too big changes and I am sure it will cause/hide > other problems. > > If there is a problem in __mmc_switch(), let's try to fix it there first. > Anyway, it is a bug of retry 3 times at max but without check current card status and ensure it's in transfer state before next retry. > >> > I think the purpose of "re-tune" is trying to cover particular case(eg. > >> > voltage fluctuate or EMI or some glitch of host/device which caused CRC > >> > error) > >> > >> No, re-tuning is to compensate for drift caused primarily by temperature change. > >> > > Yes, by JEDEC spec, temperature change cause timing drift of EMMC > > device, but, as you mentioned, maybe I have a hardware problem of host, > > but needs Software to cover it. so that we are doing our best to do > > re-tune if got CRC error. if could recover it, then it's better than > > system hung. > > Exactly in what cases do you get CRC errors for CMD6. We need a full > cmd log to understand and to help. > > >> > error) , but in such cases, too many cases are disable re-tune function > >> > by mmc_retune_hold(), for example, in this case, if a response CRC error > >> > got then we never have chance to recover it. then cause system cannot > >> > access emmc or suspend/resume fail. > >> > >> Maybe you have a hardware problem. > > There is no way I am going to accept patches touching this part of the > mmc core, without providing real evidence for how it solves a problem. > To me, it seems like you are applying a workaround for another issue. > > Again, try to provide us with some more data and logs, then perhaps we > can help narrow down the issues. > > Kind regards > Uffe Below is the fail log of suspend fail. the normal command tune result should be 0xffffff9ff, but some time, we get the tune result of 0xffffffff, then we choose the 10 as the best tune parameter, which is not stable. I know that we should focus on why we get the result of 0xffffffff, this may be result of device/host timing shifting while tuning. but what I want to do is that when get a response CRC error, we can do re-tune to recovery it, but not only return the -84 and cause suspend fail eventually. if all hardware are perfect, then we don't need the re-tune mechanism. as Adrian's comment, if temperature change at here caused CMD6 response CRC error, then how to recovery it ? [ 129.106622] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is stable [ 129.109404] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: [map:ffffe03f] [maxlen:19] [final:22] --------------------> below is next resume and re-init card: [ 129.778454] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set error -22: 3300000 - 3300000 [ 130.016987] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10 is not stable. [ 130.019556] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: [map:ffffc03f] [maxlen:18] [final:23] [ 130.124279] (1)[1248:system_server]mmc0: cache flush error -84 [ 130.125058] (1)[1248:system_server]dpm_run_callback(): mmc_bus_suspend+0x0/0x4c returns -84 [ 130.126104] (1)[1248:system_server]PM: Device mmc0:0001 failed to suspend: error -84
[...] >> >> If there is a problem in __mmc_switch(), let's try to fix it there first. >> > Anyway, it is a bug of retry 3 times at max but without check current > card status and ensure it's in transfer state before next retry. Correct. Do you want to send a patch that fixes this? Otherwise I can do it... >> >> > I think the purpose of "re-tune" is trying to cover particular case(eg. >> >> > voltage fluctuate or EMI or some glitch of host/device which caused CRC >> >> > error) >> >> >> >> No, re-tuning is to compensate for drift caused primarily by temperature change. >> >> >> > Yes, by JEDEC spec, temperature change cause timing drift of EMMC >> > device, but, as you mentioned, maybe I have a hardware problem of host, >> > but needs Software to cover it. so that we are doing our best to do >> > re-tune if got CRC error. if could recover it, then it's better than >> > system hung. >> >> Exactly in what cases do you get CRC errors for CMD6. We need a full >> cmd log to understand and to help. >> >> >> > error) , but in such cases, too many cases are disable re-tune function >> >> > by mmc_retune_hold(), for example, in this case, if a response CRC error >> >> > got then we never have chance to recover it. then cause system cannot >> >> > access emmc or suspend/resume fail. >> >> >> >> Maybe you have a hardware problem. >> >> There is no way I am going to accept patches touching this part of the >> mmc core, without providing real evidence for how it solves a problem. >> To me, it seems like you are applying a workaround for another issue. >> >> Again, try to provide us with some more data and logs, then perhaps we >> can help narrow down the issues. >> >> Kind regards >> Uffe > > Below is the fail log of suspend fail. > the normal command tune result should be 0xffffff9ff, but some time, we > get the tune result of 0xffffffff, then we choose the 10 as the best > tune parameter, which is not stable. > I know that we should focus on why we get the result of 0xffffffff, this > may be result of device/host timing shifting while tuning. but what I > want to do is that when get a response CRC error, we can do re-tune to > recovery it, but not only return the -84 and cause suspend fail > eventually. if all hardware are perfect, then we don't need the re-tune > mechanism. Thanks for elaborating! Can you please also tell exactly which of the CMD6 commands in the suspend sequence that is triggering this problem? Cache flush? Power off notification? > > as Adrian's comment, if temperature change at here caused CMD6 response > CRC error, then how to recovery it ? So in your case, allowing re-tuning a little longer in __mmc_switch() solves your problem. Clearly there are cases when we need to prevent re-tuning when sending CMD6, however maybe not in all cases as we do today. For example it seems reasonable to not hold retuning before sending CMD6 for cache flush, but instead it should be sufficient to hold it before polling for busy in __mmc_switch(). Adrian, what's your thoughts on this? > > [ 129.106622] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is > stable > [ 129.109404] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > [map:ffffe03f] [maxlen:19] [final:22] > --------------------> below is next resume and re-init card: > [ 129.778454] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set > error -22: 3300000 - 3300000 > [ 130.016987] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10 > is not stable. As you suspect the tuning didn't work out correctly, then why don't you retry one more time? > [ 130.019556] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > [map:ffffc03f] [maxlen:18] [final:23] > [ 130.124279] (1)[1248:system_server]mmc0: cache flush error -84 > [ 130.125058] (1)[1248:system_server]dpm_run_callback(): > mmc_bus_suspend+0x0/0x4c returns -84 > [ 130.126104] (1)[1248:system_server]PM: Device mmc0:0001 failed to > suspend: error -84 > > > Kind regards Uffe
On 28/03/17 11:30, Ulf Hansson wrote: > [...] > >>> >>> If there is a problem in __mmc_switch(), let's try to fix it there first. >>> >> Anyway, it is a bug of retry 3 times at max but without check current >> card status and ensure it's in transfer state before next retry. > > Correct. Do you want to send a patch that fixes this? Otherwise I can do it... > >>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg. >>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC >>>>>> error) >>>>> >>>>> No, re-tuning is to compensate for drift caused primarily by temperature change. >>>>> >>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC >>>> device, but, as you mentioned, maybe I have a hardware problem of host, >>>> but needs Software to cover it. so that we are doing our best to do >>>> re-tune if got CRC error. if could recover it, then it's better than >>>> system hung. >>> >>> Exactly in what cases do you get CRC errors for CMD6. We need a full >>> cmd log to understand and to help. >>> >>>>>> error) , but in such cases, too many cases are disable re-tune function >>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error >>>>>> got then we never have chance to recover it. then cause system cannot >>>>>> access emmc or suspend/resume fail. >>>>> >>>>> Maybe you have a hardware problem. >>> >>> There is no way I am going to accept patches touching this part of the >>> mmc core, without providing real evidence for how it solves a problem. >>> To me, it seems like you are applying a workaround for another issue. >>> >>> Again, try to provide us with some more data and logs, then perhaps we >>> can help narrow down the issues. >>> >>> Kind regards >>> Uffe >> >> Below is the fail log of suspend fail. >> the normal command tune result should be 0xffffff9ff, but some time, we >> get the tune result of 0xffffffff, then we choose the 10 as the best >> tune parameter, which is not stable. >> I know that we should focus on why we get the result of 0xffffffff, this >> may be result of device/host timing shifting while tuning. but what I >> want to do is that when get a response CRC error, we can do re-tune to >> recovery it, but not only return the -84 and cause suspend fail >> eventually. if all hardware are perfect, then we don't need the re-tune >> mechanism. > > Thanks for elaborating! > > Can you please also tell exactly which of the CMD6 commands in the > suspend sequence that is triggering this problem? Cache flush? Power > off notification? > >> >> as Adrian's comment, if temperature change at here caused CMD6 response >> CRC error, then how to recovery it ? > > So in your case, allowing re-tuning a little longer in __mmc_switch() > solves your problem. Clearly there are cases when we need to prevent > re-tuning when sending CMD6, however maybe not in all cases as we do > today. > > For example it seems reasonable to not hold retuning before sending > CMD6 for cache flush, but instead it should be sufficient to hold it > before polling for busy in __mmc_switch(). > > Adrian, what's your thoughts on this? mmc_retune_hold() and mmc_retune_release() are designed to go around a group of commands, but re-tuning can still be done before the first command. i.e. mmc_retune_hold <re-tune can happen here> cmd A <re-tune not allowed here> cmd B <re-tune not allowed here> cmd C mmc_retune_release That is the same in the retry case: mmc_retune_hold <re-tune can happen here> cmd A <re-tune not allowed here> retry cmd A <re-tune not allowed here> cmd B <re-tune not allowed here> cmd C mmc_retune_release The retry mechanism provided by mmc_wait_for_cmd() and friends really only makes sense for simple commands. In other cases, like this, we need to consider what state the card is in. For __mmc_switch we need to consider whether the card is busy or whether a timing change been made. > >> >> [ 129.106622] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is >> stable >> [ 129.109404] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffe03f] [maxlen:19] [final:22] >> --------------------> below is next resume and re-init card: >> [ 129.778454] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set >> error -22: 3300000 - 3300000 >> [ 130.016987] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10 >> is not stable. > > As you suspect the tuning didn't work out correctly, then why don't > you retry one more time? Or restore the previously known good result? > >> [ 130.019556] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: >> [map:ffffc03f] [maxlen:18] [final:23] >> [ 130.124279] (1)[1248:system_server]mmc0: cache flush error -84 >> [ 130.125058] (1)[1248:system_server]dpm_run_callback(): >> mmc_bus_suspend+0x0/0x4c returns -84 >> [ 130.126104] (1)[1248:system_server]PM: Device mmc0:0001 failed to >> suspend: error -84 >> >> >> > > Kind regards > Uffe >
On 28 March 2017 at 11:01, Adrian Hunter <adrian.hunter@intel.com> wrote: > On 28/03/17 11:30, Ulf Hansson wrote: >> [...] >> >>>> >>>> If there is a problem in __mmc_switch(), let's try to fix it there first. >>>> >>> Anyway, it is a bug of retry 3 times at max but without check current >>> card status and ensure it's in transfer state before next retry. >> >> Correct. Do you want to send a patch that fixes this? Otherwise I can do it... >> >>>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg. >>>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC >>>>>>> error) >>>>>> >>>>>> No, re-tuning is to compensate for drift caused primarily by temperature change. >>>>>> >>>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC >>>>> device, but, as you mentioned, maybe I have a hardware problem of host, >>>>> but needs Software to cover it. so that we are doing our best to do >>>>> re-tune if got CRC error. if could recover it, then it's better than >>>>> system hung. >>>> >>>> Exactly in what cases do you get CRC errors for CMD6. We need a full >>>> cmd log to understand and to help. >>>> >>>>>>> error) , but in such cases, too many cases are disable re-tune function >>>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error >>>>>>> got then we never have chance to recover it. then cause system cannot >>>>>>> access emmc or suspend/resume fail. >>>>>> >>>>>> Maybe you have a hardware problem. >>>> >>>> There is no way I am going to accept patches touching this part of the >>>> mmc core, without providing real evidence for how it solves a problem. >>>> To me, it seems like you are applying a workaround for another issue. >>>> >>>> Again, try to provide us with some more data and logs, then perhaps we >>>> can help narrow down the issues. >>>> >>>> Kind regards >>>> Uffe >>> >>> Below is the fail log of suspend fail. >>> the normal command tune result should be 0xffffff9ff, but some time, we >>> get the tune result of 0xffffffff, then we choose the 10 as the best >>> tune parameter, which is not stable. >>> I know that we should focus on why we get the result of 0xffffffff, this >>> may be result of device/host timing shifting while tuning. but what I >>> want to do is that when get a response CRC error, we can do re-tune to >>> recovery it, but not only return the -84 and cause suspend fail >>> eventually. if all hardware are perfect, then we don't need the re-tune >>> mechanism. >> >> Thanks for elaborating! >> >> Can you please also tell exactly which of the CMD6 commands in the >> suspend sequence that is triggering this problem? Cache flush? Power >> off notification? >> >>> >>> as Adrian's comment, if temperature change at here caused CMD6 response >>> CRC error, then how to recovery it ? >> >> So in your case, allowing re-tuning a little longer in __mmc_switch() >> solves your problem. Clearly there are cases when we need to prevent >> re-tuning when sending CMD6, however maybe not in all cases as we do >> today. >> >> For example it seems reasonable to not hold retuning before sending >> CMD6 for cache flush, but instead it should be sufficient to hold it >> before polling for busy in __mmc_switch(). >> >> Adrian, what's your thoughts on this? > > mmc_retune_hold() and mmc_retune_release() are designed to go around a group > of commands, but re-tuning can still be done before the first command. i.e. > > mmc_retune_hold > <re-tune can happen here> > cmd A > <re-tune not allowed here> > cmd B > <re-tune not allowed here> > cmd C > mmc_retune_release > > That is the same in the retry case > > mmc_retune_hold > <re-tune can happen here> > cmd A > <re-tune not allowed here> > retry cmd A > <re-tune not allowed here> > cmd B > <re-tune not allowed here> > cmd C > mmc_retune_release Ohh, right! Thanks for the detailed description. > > The retry mechanism provided by mmc_wait_for_cmd() and friends really only > makes sense for simple commands. In other cases, like this, we need to > consider what state the card is in. For __mmc_switch we need to consider > whether the card is busy or whether a timing change been made. I definitely agree. We should remove retries for CMD6 and perhaps also for some other cases. When we have changed the above in __mmc_switch(), the change Chaotian suggest gets a different impact, as it would potentially allow a re-tuning to happen before the next CMD1to poll for busy or to check the switch status. This isn't okay. This all sounds to me that Chaotian's issue may not all be related to tuning, but to the CMD6 switch sequence itself. However I may be wrong - of course. :-) [...] Kind regards Uffe
On Tue, 2017-03-28 at 12:01 +0300, Adrian Hunter wrote: > On 28/03/17 11:30, Ulf Hansson wrote: > > [...] > > > >>> > >>> If there is a problem in __mmc_switch(), let's try to fix it there first. > >>> > >> Anyway, it is a bug of retry 3 times at max but without check current > >> card status and ensure it's in transfer state before next retry. > > > > Correct. Do you want to send a patch that fixes this? Otherwise I can do it... > > Actually, it may hard to check card status, if host supports MMC_CAP_WAIT_WHILE_BUSY or mmc->card_busy(), it will easy to know current card status, or we must issue CMD13 to do polling, but as you know, CMD13 may also gets response CRC error. > >>>>>> I think the purpose of "re-tune" is trying to cover particular case(eg. > >>>>>> voltage fluctuate or EMI or some glitch of host/device which caused CRC > >>>>>> error) > >>>>> > >>>>> No, re-tuning is to compensate for drift caused primarily by temperature change. > >>>>> > >>>> Yes, by JEDEC spec, temperature change cause timing drift of EMMC > >>>> device, but, as you mentioned, maybe I have a hardware problem of host, > >>>> but needs Software to cover it. so that we are doing our best to do > >>>> re-tune if got CRC error. if could recover it, then it's better than > >>>> system hung. > >>> > >>> Exactly in what cases do you get CRC errors for CMD6. We need a full > >>> cmd log to understand and to help. > >>> > >>>>>> error) , but in such cases, too many cases are disable re-tune function > >>>>>> by mmc_retune_hold(), for example, in this case, if a response CRC error > >>>>>> got then we never have chance to recover it. then cause system cannot > >>>>>> access emmc or suspend/resume fail. > >>>>> > >>>>> Maybe you have a hardware problem. > >>> > >>> There is no way I am going to accept patches touching this part of the > >>> mmc core, without providing real evidence for how it solves a problem. > >>> To me, it seems like you are applying a workaround for another issue. > >>> > >>> Again, try to provide us with some more data and logs, then perhaps we > >>> can help narrow down the issues. > >>> > >>> Kind regards > >>> Uffe > >> > >> Below is the fail log of suspend fail. > >> the normal command tune result should be 0xffffff9ff, but some time, we > >> get the tune result of 0xffffffff, then we choose the 10 as the best > >> tune parameter, which is not stable. > >> I know that we should focus on why we get the result of 0xffffffff, this > >> may be result of device/host timing shifting while tuning. but what I > >> want to do is that when get a response CRC error, we can do re-tune to > >> recovery it, but not only return the -84 and cause suspend fail > >> eventually. if all hardware are perfect, then we don't need the re-tune > >> mechanism. > > > > Thanks for elaborating! > > > > Can you please also tell exactly which of the CMD6 commands in the > > suspend sequence that is triggering this problem? Cache flush? Power > > off notification? > > > >> > >> as Adrian's comment, if temperature change at here caused CMD6 response > >> CRC error, then how to recovery it ? > > > > So in your case, allowing re-tuning a little longer in __mmc_switch() > > solves your problem. Clearly there are cases when we need to prevent > > re-tuning when sending CMD6, however maybe not in all cases as we do > > today. > > > > For example it seems reasonable to not hold retuning before sending > > CMD6 for cache flush, but instead it should be sufficient to hold it > > before polling for busy in __mmc_switch(). > > > > Adrian, what's your thoughts on this? > > mmc_retune_hold() and mmc_retune_release() are designed to go around a group > of commands, but re-tuning can still be done before the first command. i.e. > > mmc_retune_hold > <re-tune can happen here> > cmd A > <re-tune not allowed here> > cmd B > <re-tune not allowed here> > cmd C > mmc_retune_release > > That is the same in the retry case: > > mmc_retune_hold > <re-tune can happen here> > cmd A > <re-tune not allowed here> > retry cmd A > <re-tune not allowed here> > cmd B > <re-tune not allowed here> > cmd C > mmc_retune_release Thanks for explain the mechanism of mmc_retune_hold() > > The retry mechanism provided by mmc_wait_for_cmd() and friends really only > makes sense for simple commands. In other cases, like this, we need to > consider what state the card is in. For __mmc_switch we need to consider > whether the card is busy or whether a timing change been made. > Yes, different R1B command need different handle, like CMD12, if card is not in send/recv state, retry of CMD12 will get timeout. > > > >> > >> [ 129.106622] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > >> [map:fffff9ff] [maxlen:21] [final:21] -->current result is OK and 21 is > >> stable > >> [ 129.109404] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > >> [map:ffffe03f] [maxlen:19] [final:22] > >> --------------------> below is next resume and re-init card: > >> [ 129.778454] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: Regulator set > >> error -22: 3300000 - 3300000 > >> [ 130.016987] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > >> [map:ffffffff] [maxlen:32] [final:10] --> this result if not OK and 10 > >> is not stable. > > > > As you suspect the tuning didn't work out correctly, then why don't > > you retry one more time? > > Or restore the previously known good result? > At runtime, we don't know if current tune result is OK or unstable. when analysis the fail log, then find the difference between OK and NG result. > > > >> [ 130.019556] (0)[96:mmcqd/0]mtk-msdc 11230000.mmc: phase: > >> [map:ffffc03f] [maxlen:18] [final:23] > >> [ 130.124279] (1)[1248:system_server]mmc0: cache flush error -84 > >> [ 130.125058] (1)[1248:system_server]dpm_run_callback(): > >> mmc_bus_suspend+0x0/0x4c returns -84 > >> [ 130.126104] (1)[1248:system_server]PM: Device mmc0:0001 failed to > >> suspend: error -84 > >> > >> > >> > > > > Kind regards > > Uffe > > >
[...] >> The retry mechanism provided by mmc_wait_for_cmd() and friends really only >> makes sense for simple commands. In other cases, like this, we need to >> consider what state the card is in. For __mmc_switch we need to consider >> whether the card is busy or whether a timing change been made. > > I definitely agree. We should remove retries for CMD6 and perhaps also > for some other cases. > > When we have changed the above in __mmc_switch(), the change Chaotian > suggest gets a different impact, as it would potentially allow a > re-tuning to happen before the next CMD1to poll for busy or to check /s/CMD1/CMD13 > the switch status. This isn't okay. > > This all sounds to me that Chaotian's issue may not all be related to > tuning, but to the CMD6 switch sequence itself. However I may be wrong > - of course. :-) > > [...] > > Kind regards > Uffe
On Tue, 2017-03-28 at 11:59 +0200, Ulf Hansson wrote: > [...] > > >> The retry mechanism provided by mmc_wait_for_cmd() and friends really only > >> makes sense for simple commands. In other cases, like this, we need to > >> consider what state the card is in. For __mmc_switch we need to consider > >> whether the card is busy or whether a timing change been made. > > > > I definitely agree. We should remove retries for CMD6 and perhaps also > > for some other cases. > > just only remove retries ? how to do error handle of CMD6 response CRC error ? > > When we have changed the above in __mmc_switch(), the change Chaotian > > suggest gets a different impact, as it would potentially allow a > > re-tuning to happen before the next CMD1to poll for busy or to check > > /s/CMD1/CMD13 > > > the switch status. This isn't okay. > > > > This all sounds to me that Chaotian's issue may not all be related to > > tuning, but to the CMD6 switch sequence itself. However I may be wrong > > - of course. :-) We had already noticed this issue and do busy check before CMD6 was sent in our host driver. so that for me, the rest work is to do re-tune before the next cmd6... > > > > [...] > > > > Kind regards > > Uffe
[...] >> Below is the fail log of suspend fail. >> the normal command tune result should be 0xffffff9ff, but some time, we >> get the tune result of 0xffffffff, then we choose the 10 as the best >> tune parameter, which is not stable. >> I know that we should focus on why we get the result of 0xffffffff, this >> may be result of device/host timing shifting while tuning. but what I >> want to do is that when get a response CRC error, we can do re-tune to >> recovery it, but not only return the -84 and cause suspend fail >> eventually. if all hardware are perfect, then we don't need the re-tune >> mechanism. > > Thanks for elaborating! > > Can you please also tell exactly which of the CMD6 commands in the > suspend sequence that is triggering this problem? Cache flush? Power > off notification? You didn't answer this question. I would really like to know the sequence of the commands you see that are being sent to the card during suspend. And of course in particular what command that fails. [...] Kind regards Uffe
On Wed, 2017-03-29 at 12:33 +0200, Ulf Hansson wrote: > [...] > > >> Below is the fail log of suspend fail. > >> the normal command tune result should be 0xffffff9ff, but some time, we > >> get the tune result of 0xffffffff, then we choose the 10 as the best > >> tune parameter, which is not stable. > >> I know that we should focus on why we get the result of 0xffffffff, this > >> may be result of device/host timing shifting while tuning. but what I > >> want to do is that when get a response CRC error, we can do re-tune to > >> recovery it, but not only return the -84 and cause suspend fail > >> eventually. if all hardware are perfect, then we don't need the re-tune > >> mechanism. > > > > Thanks for elaborating! > > > > Can you please also tell exactly which of the CMD6 commands in the > > suspend sequence that is triggering this problem? Cache flush? Power > > off notification? > > You didn't answer this question. I would really like to know the > sequence of the commands you see that are being sent to the card > during suspend. And of course in particular what command that fails. > Sorry, it's cache flush. I assume that you have noticed the fail log(it shows that cache flush error) > [...] > > Kind regards > Uffe
diff --git a/drivers/mmc/core/mmc_ops.c b/drivers/mmc/core/mmc_ops.c index fe80f26..6931927 100644 --- a/drivers/mmc/core/mmc_ops.c +++ b/drivers/mmc/core/mmc_ops.c @@ -534,8 +534,6 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, bool use_r1b_resp = use_busy_signal; unsigned char old_timing = host->ios.timing; - mmc_retune_hold(host); - /* * If the cmd timeout and the max_busy_timeout of the host are both * specified, let's validate them. A failure means we need to prevent @@ -567,6 +565,7 @@ int __mmc_switch(struct mmc_card *card, u8 set, u8 index, u8 value, cmd.sanitize_busy = true; err = mmc_wait_for_cmd(host, &cmd, MMC_CMD_RETRIES); + mmc_retune_hold(host); if (err) goto out;
this patch is refine for 'commit c6dbab9cb58f ("mmc: core: Hold re-tuning during switch commands")' Since it has 3 retries at max for CMD6, if the first CMD6 got CRC error, then should do re-tune before the next CMD6 was sent. Signed-off-by: Chaotian Jing <chaotian.jing@mediatek.com> --- drivers/mmc/core/mmc_ops.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)