Message ID | 1473817868-27812-2-git-send-email-shawn.lin@rock-chips.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 14 September 2016 at 03:51, Shawn Lin <shawn.lin@rock-chips.com> wrote: > We observed the failure of initializing card after resume > accidentally. It's hard to reproduce but we did get report from > the suspend/resume test of our RK3399 mp test farm . Unfortunately, > we still fail to figure out what was going wrong at that time. > Also we can't achieve it by retrying the host->f_init without > falling back it. But this patch will solve the problem as we could > add some log there and see that we resume the mmc card successfully > after falling back the host->f_init. > > I believe this is not a specific case for any host controllers as > it's possible the f_init used for this time may fail to init the > mmc for the next booting due to some unexpected signal interference > ,etc. Given that, the fallback for host->f_init when booting improve > the robustness. So this should be also beneficial to the resume case, > not just for sloving one specific issue. No, this becomes too much of hypothetical guesses. I don't like it, sorry! Let me elaborate on why, by browsing my comments below. > > [ 93.405085] mmc1: unexpected status 0x800900 after switch > [ 93.408474] mmc1: switch to bus width 1 failed > [ 93.408482] mmc1: mmc_select_hs200 failed, error -110 > [ 93.408492] mmc1: error -110 during resume (card was removed?) > [ 93.408705] PM: resume of devices complete after 213.453 msecs > > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> > > --- > > Changes in v2: > - remove mmc_power_off > - take f_min into consideration > > drivers/mmc/core/mmc.c | 19 +++++++++++++++++-- > 1 file changed, 17 insertions(+), 2 deletions(-) > > diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c > index 3486bc7..06bebc1 100644 > --- a/drivers/mmc/core/mmc.c > +++ b/drivers/mmc/core/mmc.c > @@ -1963,6 +1963,7 @@ static int mmc_suspend(struct mmc_host *host) > static int _mmc_resume(struct mmc_host *host) > { > int err = 0; > + int i; > > BUG_ON(!host); > BUG_ON(!host->card); > @@ -1972,8 +1973,22 @@ static int _mmc_resume(struct mmc_host *host) > if (!mmc_card_suspended(host->card)) > goto out; > > - mmc_power_up(host, host->card->ocr); > - err = mmc_init_card(host, host->card->ocr, host->card); > + /* > + * Let's try to fallback the host->f_init > + * if failing to init mmc card after resume. > + */ > + for (i = 0; i < ARRAY_SIZE(freqs); i++) { > + if (host->f_init < max(freqs[i], host->f_min)) > + continue; Let's consider the following scenario: host->f_min = 100kHz and host->f_init = 400kHz. Which means earlier from mmc_rescan() when we first detected the card, we succeeded using 400kHz. I assume that would be somewhat similar to your use-case, right? In this example, we will hit the else clause and try to re-initialize the card using 400 kHz... > + else > + host->f_init = max(freqs[i], host->f_min); > + > + mmc_power_up(host, host->card->ocr); According to your description in the change log, I guess what you experienced is that the first re-initialization attempt by using 400 kHz fails. That's why you want try out 300 kHz in the next loop instead, right!? Unfortunate, this doesn't work, as in mmc_power_up() we bail out early, see pasted code snippet below. ----- if (host->ios.power_mode == MMC_POWER_ON) return; ----- So this change is actually completely wrong, as you will try to re-initialize the with card in the second attempt with a totally messed up ios setting created in the earlier attempt. It just can't work. > + err = mmc_init_card(host, host->card->ocr, host->card); > + if (!err) > + break; > + } > + > mmc_card_clr_suspended(host->card); So I was thinking a bit if we really need to achieve a more "robust" resume... I think the only option would be to call mmc_reset() when we notice that mmc_init_card() fails. I don't think we should need to change host->f_init. Moreover we should likely also check for MMC_CAP_NONREMOVABLE, as it's only in those cases when a reset makes sense. Now, if we intend to do this for mmc, we should do it for SD and SDIO as well. Kind regards Uffe -- To unsubscribe from this list: send the line "unsubscribe linux-mmc" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c index 3486bc7..06bebc1 100644 --- a/drivers/mmc/core/mmc.c +++ b/drivers/mmc/core/mmc.c @@ -1963,6 +1963,7 @@ static int mmc_suspend(struct mmc_host *host) static int _mmc_resume(struct mmc_host *host) { int err = 0; + int i; BUG_ON(!host); BUG_ON(!host->card); @@ -1972,8 +1973,22 @@ static int _mmc_resume(struct mmc_host *host) if (!mmc_card_suspended(host->card)) goto out; - mmc_power_up(host, host->card->ocr); - err = mmc_init_card(host, host->card->ocr, host->card); + /* + * Let's try to fallback the host->f_init + * if failing to init mmc card after resume. + */ + for (i = 0; i < ARRAY_SIZE(freqs); i++) { + if (host->f_init < max(freqs[i], host->f_min)) + continue; + else + host->f_init = max(freqs[i], host->f_min); + + mmc_power_up(host, host->card->ocr); + err = mmc_init_card(host, host->card->ocr, host->card); + if (!err) + break; + } + mmc_card_clr_suspended(host->card); out:
We observed the failure of initializing card after resume accidentally. It's hard to reproduce but we did get report from the suspend/resume test of our RK3399 mp test farm . Unfortunately, we still fail to figure out what was going wrong at that time. Also we can't achieve it by retrying the host->f_init without falling back it. But this patch will solve the problem as we could add some log there and see that we resume the mmc card successfully after falling back the host->f_init. I believe this is not a specific case for any host controllers as it's possible the f_init used for this time may fail to init the mmc for the next booting due to some unexpected signal interference ,etc. Given that, the fallback for host->f_init when booting improve the robustness. So this should be also beneficial to the resume case, not just for sloving one specific issue. [ 93.405085] mmc1: unexpected status 0x800900 after switch [ 93.408474] mmc1: switch to bus width 1 failed [ 93.408482] mmc1: mmc_select_hs200 failed, error -110 [ 93.408492] mmc1: error -110 during resume (card was removed?) [ 93.408705] PM: resume of devices complete after 213.453 msecs Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> --- Changes in v2: - remove mmc_power_off - take f_min into consideration drivers/mmc/core/mmc.c | 19 +++++++++++++++++-- 1 file changed, 17 insertions(+), 2 deletions(-)