Message ID | 1511271770-3444-8-git-send-email-adrian.hunter@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 21 November 2017 at 14:42, Adrian Hunter <adrian.hunter@intel.com> wrote: > card_busy_detect() has a 10 minute timeout. However the correct timeout is > the data timeout. Change card_busy_detect() to use the data timeout. Unfortunate I don't think there is "correct" timeout for this case. The data->timeout_ns is to indicate for the host to how long the maximum time it's allowed to take between blocks that are written to the data lines. I haven't found a definition of the busy timeout, after the data write has completed. The spec only mentions that the device moves to programming state and pulls DAT0 to indicate busy. Sure, 10 min seems crazy, perhaps something along the lines of 10-20 s is more reasonable. What do you think? Br Uffe > > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> > --- > drivers/mmc/core/block.c | 48 ++++++++++++++++++++++++++++++++++++++++-------- > 1 file changed, 40 insertions(+), 8 deletions(-) > > diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c > index e44f6d90aeb4..0874ab3e5c92 100644 > --- a/drivers/mmc/core/block.c > +++ b/drivers/mmc/core/block.c > @@ -63,7 +63,6 @@ > #endif > #define MODULE_PARAM_PREFIX "mmcblk." > > -#define MMC_BLK_TIMEOUT_MS (10 * 60 * 1000) /* 10 minute timeout */ > #define MMC_SANITIZE_REQ_TIMEOUT 240000 > #define MMC_EXTRACT_INDEX_FROM_ARG(x) ((x & 0x00FF0000) >> 16) > > @@ -921,14 +920,48 @@ static int mmc_sd_num_wr_blocks(struct mmc_card *card, u32 *written_blocks) > return 0; > } > > -static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms, > - bool hw_busy_detect, struct request *req, bool *gen_err) > +static unsigned int mmc_blk_clock_khz(struct mmc_host *host) > { > - unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms); > + if (host->actual_clock) > + return host->actual_clock / 1000; > + > + /* Clock may be subject to a divisor, fudge it by a factor of 2. */ > + if (host->ios.clock) > + return host->ios.clock / 2000; > + > + /* How can there be no clock */ > + WARN_ON_ONCE(1); > + return 100; /* 100 kHz is minimum possible value */ > +} > + > +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host, > + struct mmc_data *data) > +{ > + unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000); > + unsigned int khz; > + > + if (data->timeout_clks) { > + khz = mmc_blk_clock_khz(host); > + ms += DIV_ROUND_UP(data->timeout_clks, khz); > + } > + > + return msecs_to_jiffies(ms); > +} > + > +static int card_busy_detect(struct mmc_card *card, bool hw_busy_detect, > + struct request *req, bool *gen_err) > +{ > + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); > + struct mmc_data *data = &mqrq->brq.data; > + unsigned long timeout; > int err = 0; > u32 status; > > + timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data); > + > do { > + bool done = time_after(jiffies, timeout); > + > err = __mmc_send_status(card, &status, 5); > if (err) { > pr_err("%s: error %d requesting status\n", > @@ -951,7 +984,7 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms, > * Timeout if the device never becomes ready for data and never > * leaves the program state. > */ > - if (time_after(jiffies, timeout)) { > + if (done) { > pr_err("%s: Card stuck in programming state! %s %s\n", > mmc_hostname(card->host), > req->rq_disk->disk_name, __func__); > @@ -1011,7 +1044,7 @@ static int send_stop(struct mmc_card *card, unsigned int timeout_ms, > *gen_err = true; > } > > - return card_busy_detect(card, timeout_ms, use_r1b_resp, req, gen_err); > + return card_busy_detect(card, use_r1b_resp, req, gen_err); > } > > #define ERR_NOMEDIUM 3 > @@ -1546,8 +1579,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, > gen_err = true; > } > > - err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, > - &gen_err); > + err = card_busy_detect(card, false, req, &gen_err); > if (err) > return MMC_BLK_CMD_ERR; > } > -- > 1.9.1 >
On 21/11/17 17:39, Ulf Hansson wrote: > On 21 November 2017 at 14:42, Adrian Hunter <adrian.hunter@intel.com> wrote: >> card_busy_detect() has a 10 minute timeout. However the correct timeout is >> the data timeout. Change card_busy_detect() to use the data timeout. > > Unfortunate I don't think there is "correct" timeout for this case. > > The data->timeout_ns is to indicate for the host to how long the > maximum time it's allowed to take between blocks that are written to > the data lines. > > I haven't found a definition of the busy timeout, after the data write > has completed. The spec only mentions that the device moves to > programming state and pulls DAT0 to indicate busy. To me it reads more like the timeout is for each block, including the last i.e. the same timeout for "busy". Note the card is also busy between blocks. Equally it is the timeout we give the host controller. So either the host controller does not have a timeout for "busy" - which begs the question why it has a timeout at all - or it invents its own "busy" timeout - which begs the question why it isn't in the spec. > > Sure, 10 min seems crazy, perhaps something along the lines of 10-20 s > is more reasonable. What do you think? We give SD cards a generous 3 seconds for writes. SDHCI has long had a 10 second software timer for the whole request, which strongly suggests that requests have always completed within 10 seconds. So that puts the range of an arbitrary timeout 3-10 s.
On 22 November 2017 at 08:40, Adrian Hunter <adrian.hunter@intel.com> wrote: > On 21/11/17 17:39, Ulf Hansson wrote: >> On 21 November 2017 at 14:42, Adrian Hunter <adrian.hunter@intel.com> wrote: >>> card_busy_detect() has a 10 minute timeout. However the correct timeout is >>> the data timeout. Change card_busy_detect() to use the data timeout. >> >> Unfortunate I don't think there is "correct" timeout for this case. >> >> The data->timeout_ns is to indicate for the host to how long the >> maximum time it's allowed to take between blocks that are written to >> the data lines. >> >> I haven't found a definition of the busy timeout, after the data write >> has completed. The spec only mentions that the device moves to >> programming state and pulls DAT0 to indicate busy. > > To me it reads more like the timeout is for each block, including the last > i.e. the same timeout for "busy". Note the card is also busy between blocks. I don't think that is the same timeout. Or maybe it is. In the eMMC 5.1 spec, there are mentions about "buffer busy signal" and "programming busy signal", see section 6.15.3 (Timings - Data Write). Anyway, whether any of them is specified, is to me unclear. > > Equally it is the timeout we give the host controller. So either the host > controller does not have a timeout for "busy" - which begs the question why > it has a timeout at all - or it invents its own "busy" timeout - which begs > the question why it isn't in the spec. Well, there are some vague hints in section 6.8.2 (Time-out conditions), but I don't find these timeouts values being referred to in 6.15 (Timings). And that puzzles me. Moreover, the below is quoted from section 6.6.8.1 (Block write): ------ Some Devices may require long and unpredictable times to write a block of data. After receiving a block of data and completing the CRC check, the Device will begin writing and hold the DAT0 line low. The host may poll the status of the Device with a SEND_STATUS command (CMD13) at any time, and the Device will respond with its status (except in Sleep state). The status bit READY_FOR_DATA indicates whether the Device can accept new data or not. The host may deselect the Device by issuing CMD7 that will then displace the Device into the Disconnect State and release the DAT0 line without interrupting the write operation. When reselecting the Device, it will reactivate busy indication by pulling DAT0 to low. See 6.15 for details of busy indication. ------ > >> >> Sure, 10 min seems crazy, perhaps something along the lines of 10-20 s >> is more reasonable. What do you think? > > We give SD cards a generous 3 seconds for writes. SDHCI has long had a 10 > second software timer for the whole request, which strongly suggests that > requests have always completed within 10 seconds. So that puts the range of > an arbitrary timeout 3-10 s. From the reasoning above, I guess we could try out 10 s. That is at least a lot better than 10 minutes. I also see that we have at three different places (switch, erase, block data transfers) implementing busy signal detection. It would be nice to try to align those pieces of code, because they are quite similar. Of course, this deserves it's own separate task to try to fix up. BTW, perhaps we should move this to a separate change on top of the series? Or is there as specific need for this in regards to blkmq and CQE? Kind regards Uffe
On 22/11/17 16:43, Ulf Hansson wrote: > On 22 November 2017 at 08:40, Adrian Hunter <adrian.hunter@intel.com> wrote: >> On 21/11/17 17:39, Ulf Hansson wrote: >>> On 21 November 2017 at 14:42, Adrian Hunter <adrian.hunter@intel.com> wrote: >>>> card_busy_detect() has a 10 minute timeout. However the correct timeout is >>>> the data timeout. Change card_busy_detect() to use the data timeout. >>> >>> Unfortunate I don't think there is "correct" timeout for this case. >>> >>> The data->timeout_ns is to indicate for the host to how long the >>> maximum time it's allowed to take between blocks that are written to >>> the data lines. >>> >>> I haven't found a definition of the busy timeout, after the data write >>> has completed. The spec only mentions that the device moves to >>> programming state and pulls DAT0 to indicate busy. >> >> To me it reads more like the timeout is for each block, including the last >> i.e. the same timeout for "busy". Note the card is also busy between blocks. > > I don't think that is the same timeout. Or maybe it is. > > In the eMMC 5.1 spec, there are mentions about "buffer busy signal" > and "programming busy signal", see section 6.15.3 (Timings - Data > Write). > > Anyway, whether any of them is specified, is to me unclear. > >> >> Equally it is the timeout we give the host controller. So either the host >> controller does not have a timeout for "busy" - which begs the question why >> it has a timeout at all - or it invents its own "busy" timeout - which begs >> the question why it isn't in the spec. > > Well, there are some vague hints in section 6.8.2 (Time-out > conditions), but I don't find these timeouts values being referred to > in 6.15 (Timings). And that puzzles me. > > Moreover, the below is quoted from section 6.6.8.1 (Block write): > ------ > Some Devices may require long and unpredictable times to write a block > of data. After receiving a block of data and completing the CRC check, > the Device will begin writing and hold the DAT0 line low. The host may > poll the status of the Device with a SEND_STATUS command (CMD13) at > any time, and the Device will respond with its status (except in Sleep > state). The status bit READY_FOR_DATA indicates whether the Device can > accept new data or not. The host may deselect the Device by issuing > CMD7 that will then displace the Device into the Disconnect State and > release the DAT0 line without interrupting the write operation. When > reselecting the Device, it will reactivate busy indication by pulling > DAT0 to low. See 6.15 for details of busy indication. > ------ > >> >>> >>> Sure, 10 min seems crazy, perhaps something along the lines of 10-20 s >>> is more reasonable. What do you think? >> >> We give SD cards a generous 3 seconds for writes. SDHCI has long had a 10 >> second software timer for the whole request, which strongly suggests that >> requests have always completed within 10 seconds. So that puts the range of >> an arbitrary timeout 3-10 s. > >>From the reasoning above, I guess we could try out 10 s. That is at > least a lot better than 10 minutes. > > I also see that we have at three different places (switch, erase, > block data transfers) implementing busy signal detection. It would be > nice to try to align those pieces of code, because they are quite > similar. Of course, this deserves it's own separate task to try to fix > up. > > BTW, perhaps we should move this to a separate change on top of the > series? Or is there as specific need for this in regards to blkmq and > CQE? It is related to the recovery changes, so can be moved later in the patch set.
diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c index e44f6d90aeb4..0874ab3e5c92 100644 --- a/drivers/mmc/core/block.c +++ b/drivers/mmc/core/block.c @@ -63,7 +63,6 @@ #endif #define MODULE_PARAM_PREFIX "mmcblk." -#define MMC_BLK_TIMEOUT_MS (10 * 60 * 1000) /* 10 minute timeout */ #define MMC_SANITIZE_REQ_TIMEOUT 240000 #define MMC_EXTRACT_INDEX_FROM_ARG(x) ((x & 0x00FF0000) >> 16) @@ -921,14 +920,48 @@ static int mmc_sd_num_wr_blocks(struct mmc_card *card, u32 *written_blocks) return 0; } -static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms, - bool hw_busy_detect, struct request *req, bool *gen_err) +static unsigned int mmc_blk_clock_khz(struct mmc_host *host) { - unsigned long timeout = jiffies + msecs_to_jiffies(timeout_ms); + if (host->actual_clock) + return host->actual_clock / 1000; + + /* Clock may be subject to a divisor, fudge it by a factor of 2. */ + if (host->ios.clock) + return host->ios.clock / 2000; + + /* How can there be no clock */ + WARN_ON_ONCE(1); + return 100; /* 100 kHz is minimum possible value */ +} + +static unsigned long mmc_blk_data_timeout_jiffies(struct mmc_host *host, + struct mmc_data *data) +{ + unsigned int ms = DIV_ROUND_UP(data->timeout_ns, 1000000); + unsigned int khz; + + if (data->timeout_clks) { + khz = mmc_blk_clock_khz(host); + ms += DIV_ROUND_UP(data->timeout_clks, khz); + } + + return msecs_to_jiffies(ms); +} + +static int card_busy_detect(struct mmc_card *card, bool hw_busy_detect, + struct request *req, bool *gen_err) +{ + struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req); + struct mmc_data *data = &mqrq->brq.data; + unsigned long timeout; int err = 0; u32 status; + timeout = jiffies + mmc_blk_data_timeout_jiffies(card->host, data); + do { + bool done = time_after(jiffies, timeout); + err = __mmc_send_status(card, &status, 5); if (err) { pr_err("%s: error %d requesting status\n", @@ -951,7 +984,7 @@ static int card_busy_detect(struct mmc_card *card, unsigned int timeout_ms, * Timeout if the device never becomes ready for data and never * leaves the program state. */ - if (time_after(jiffies, timeout)) { + if (done) { pr_err("%s: Card stuck in programming state! %s %s\n", mmc_hostname(card->host), req->rq_disk->disk_name, __func__); @@ -1011,7 +1044,7 @@ static int send_stop(struct mmc_card *card, unsigned int timeout_ms, *gen_err = true; } - return card_busy_detect(card, timeout_ms, use_r1b_resp, req, gen_err); + return card_busy_detect(card, use_r1b_resp, req, gen_err); } #define ERR_NOMEDIUM 3 @@ -1546,8 +1579,7 @@ static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card, gen_err = true; } - err = card_busy_detect(card, MMC_BLK_TIMEOUT_MS, false, req, - &gen_err); + err = card_busy_detect(card, false, req, &gen_err); if (err) return MMC_BLK_CMD_ERR; }
card_busy_detect() has a 10 minute timeout. However the correct timeout is the data timeout. Change card_busy_detect() to use the data timeout. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> --- drivers/mmc/core/block.c | 48 ++++++++++++++++++++++++++++++++++++++++-------- 1 file changed, 40 insertions(+), 8 deletions(-)