diff mbox

[V13,10/10] mmc: block: blk-mq: Stop using legacy recovery

Message ID 1509715220-31885-11-git-send-email-adrian.hunter@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Adrian Hunter Nov. 3, 2017, 1:20 p.m. UTC
There are only a few things the recovery needs to do. Primarily, it just
needs to:
	Determine the number of bytes transferred
	Get the card back to transfer state
	Determine whether to retry

There are also a couple of additional features:
	Reset the card before the last retry
	Read one sector at a time

The legacy code spent much effort analyzing command errors, but commands
fail fast, so it is simpler just to give all command errors the same number
of retries.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 drivers/mmc/core/block.c | 261 ++++++++++++++++++++++++-----------------------
 1 file changed, 135 insertions(+), 126 deletions(-)

Comments

Linus Walleij Nov. 8, 2017, 9:38 a.m. UTC | #1
On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:

> There are only a few things the recovery needs to do. Primarily, it just
> needs to:
>         Determine the number of bytes transferred
>         Get the card back to transfer state
>         Determine whether to retry
>
> There are also a couple of additional features:
>         Reset the card before the last retry
>         Read one sector at a time
>
> The legacy code spent much effort analyzing command errors, but commands
> fail fast, so it is simpler just to give all command errors the same number
> of retries.
>
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>

I have nothing against the patch as such. In fact something
like this makes a lot of sense (to me).

But this just makes mmc_blk_rw_recovery() look really nice.

And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
error handling in-tree.

The former function isn't even named with some *mq* infix
making it clear that the new recovery path only happens
in the MQ case.

If newcomers read this code in the MMC stack they will
just tear their hair, scream and run away. Even faster than
before.

How are they supposed to know which functions are used on
which path? Run ftrace?

This illustrates firmly why we need to refactor and/or kill off
the old block layer interface *first* then add MQ on top.

Yours,
Linus Walleij
Adrian Hunter Nov. 9, 2017, 7:43 a.m. UTC | #2
On 08/11/17 11:38, Linus Walleij wrote:
> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> There are only a few things the recovery needs to do. Primarily, it just
>> needs to:
>>         Determine the number of bytes transferred
>>         Get the card back to transfer state
>>         Determine whether to retry
>>
>> There are also a couple of additional features:
>>         Reset the card before the last retry
>>         Read one sector at a time
>>
>> The legacy code spent much effort analyzing command errors, but commands
>> fail fast, so it is simpler just to give all command errors the same number
>> of retries.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> I have nothing against the patch as such. In fact something
> like this makes a lot of sense (to me).
> 
> But this just makes mmc_blk_rw_recovery() look really nice.
> 
> And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
> error handling in-tree.
> 
> The former function isn't even named with some *mq* infix
> making it clear that the new recovery path only happens
> in the MQ case.
> 
> If newcomers read this code in the MMC stack they will
> just tear their hair, scream and run away. Even faster than
> before.
> 
> How are they supposed to know which functions are used on
> which path? Run ftrace?

You're kidding me right?  You don't know how to find where a function used?

> This illustrates firmly why we need to refactor and/or kill off
> the old block layer interface *first* then add MQ on top.

No it doesn't!  You are playing games!  One function could be named
differently, so that is evidence the whole patch set should be ignored.

The old code is rubbish.  There is nothing worth keeping.  Churning it
around is a waste of everybody's time.  Review and test the new code.
Delete the old code.  Much much simpler!
Linus Walleij Nov. 9, 2017, 12:45 p.m. UTC | #3
On Thu, Nov 9, 2017 at 8:43 AM, Adrian Hunter <adrian.hunter@intel.com> wrote:
> On 08/11/17 11:38, Linus Walleij wrote:
>> On Fri, Nov 3, 2017 at 2:20 PM, Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>> There are only a few things the recovery needs to do. Primarily, it just
>>> needs to:
>>>         Determine the number of bytes transferred
>>>         Get the card back to transfer state
>>>         Determine whether to retry
>>>
>>> There are also a couple of additional features:
>>>         Reset the card before the last retry
>>>         Read one sector at a time
>>>
>>> The legacy code spent much effort analyzing command errors, but commands
>>> fail fast, so it is simpler just to give all command errors the same number
>>> of retries.
>>>
>>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>>
>> I have nothing against the patch as such. In fact something
>> like this makes a lot of sense (to me).
>>
>> But this just makes mmc_blk_rw_recovery() look really nice.
>>
>> And leaves a very ugly mmc_blk_issue_rw_rq() with the legacy
>> error handling in-tree.
>>
>> The former function isn't even named with some *mq* infix
>> making it clear that the new recovery path only happens
>> in the MQ case.
>>
>> If newcomers read this code in the MMC stack they will
>> just tear their hair, scream and run away. Even faster than
>> before.
>>
>> How are they supposed to know which functions are used on
>> which path? Run ftrace?
>
> You're kidding me right?  You don't know how to find where a function used?

What I mean is that there are now several functions in the same
file doing similar things, and it is pretty hard for a newcomer
already (IMO) to understand how the MMC/SD stack works.

This phenomenon of code complexity is not just making me
frustrated but also you, because you get annoyed that people
like me don't "just get it".

>> This illustrates firmly why we need to refactor and/or kill off
>> the old block layer interface *first* then add MQ on top.
>
> No it doesn't!  You are playing games!

You need to stop your snarky and undfriendly attitude.

Look Adrian: you have one (1) person reviewing your patches.
That is me.

I think need to quote Documentation/process/6.Followthrough.rst
in verbatim (it's an awesome piece of text!)

----8<--------8<-------8<------

Working with reviewers
----------------------

A patch of any significance will result in a number of comments from other
developers as they review the code.  Working with reviewers can be, for
many developers, the most intimidating part of the kernel development
process.  Life can be made much easier, though, if you keep a few things in
mind:

 - If you have explained your patch well, reviewers will understand its
   value and why you went to the trouble of writing it.  But that value
   will not keep them from asking a fundamental question: what will it be
   like to maintain a kernel with this code in it five or ten years later?
   Many of the changes you may be asked to make - from coding style tweaks
   to substantial rewrites - come from the understanding that Linux will
   still be around and under development a decade from now.

 - Code review is hard work, and it is a relatively thankless occupation;
   people remember who wrote kernel code, but there is little lasting fame
   for those who reviewed it.  So reviewers can get grumpy, especially when
   they see the same mistakes being made over and over again.  If you get a
   review which seems angry, insulting, or outright offensive, resist the
   impulse to respond in kind.  Code review is about the code, not about
   the people, and code reviewers are not attacking you personally.

 - Similarly, code reviewers are not trying to promote their employers'
   agendas at the expense of your own.  Kernel developers often expect to
   be working on the kernel years from now, but they understand that their
   employer could change.  They truly are, almost without exception,
   working toward the creation of the best kernel they can; they are not
   trying to create discomfort for their employers' competitors.

What all of this comes down to is that, when reviewers send you comments,
you need to pay attention to the technical observations that they are
making.  Do not let their form of expression or your own pride keep that
from happening.  When you get review comments on a patch, take the time to
understand what the reviewer is trying to say.  If possible, fix the things
that the reviewer is asking you to fix.  And respond back to the reviewer:
thank them, and describe how you will answer their questions.

----8<--------8<-------8<------

> One function could be named
> differently, so that is evidence the whole patch set should be ignored.

No I was not pointing to that at all.

> The old code is rubbish.  There is nothing worth keeping.  Churning it
> around is a waste of everybody's time.  Review and test the new code.
> Delete the old code.  Much much simpler!

Yeah, but you do not delete the old code in your patch
set so why are you are saying this?

The essence of my response to patch 3 was exactly that if
you have this "my way or the highway" (i.e. delete all the old
code paths) attitude to the code (which I kind of understand)
Then go all in and actually delete it.

As it is, here, at the end of the patch set, the old legacy block
path and combersome error handling still exists.

Yours,
Linus Walleij
diff mbox

Patch

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5c5ff3c34313..623fa2be7077 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1480,9 +1480,11 @@  static void mmc_blk_eval_resp_error(struct mmc_blk_request *brq)
 	}
 }
 
-static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
-					       struct mmc_queue_req *mq_mrq)
+static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
+					     struct mmc_async_req *areq)
 {
+	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
+						    areq);
 	struct mmc_blk_request *brq = &mq_mrq->brq;
 	struct request *req = mmc_queue_req_to_req(mq_mrq);
 	int need_retune = card->host->need_retune;
@@ -1588,15 +1590,6 @@  static enum mmc_blk_status __mmc_blk_err_check(struct mmc_card *card,
 	return MMC_BLK_SUCCESS;
 }
 
-static enum mmc_blk_status mmc_blk_err_check(struct mmc_card *card,
-					     struct mmc_async_req *areq)
-{
-	struct mmc_queue_req *mq_mrq = container_of(areq, struct mmc_queue_req,
-						    areq);
-
-	return __mmc_blk_err_check(card, mq_mrq);
-}
-
 static void mmc_blk_data_prep(struct mmc_queue *mq, struct mmc_queue_req *mqrq,
 			      int disable_multi, bool *do_rel_wr_p,
 			      bool *do_data_tag_p)
@@ -1922,6 +1915,7 @@  static void mmc_blk_rw_rq_prep(struct mmc_queue_req *mqrq,
 }
 
 #define MMC_MAX_RETRIES		5
+#define MMC_DATA_RETRIES	2
 #define MMC_NO_RETRIES		(MMC_MAX_RETRIES + 1)
 
 /* Single sector read during recovery */
@@ -1974,6 +1968,28 @@  static inline bool mmc_blk_in_tran_state(u32 status)
 	       (R1_CURRENT_STATE(status) == R1_STATE_TRAN);
 }
 
+/*
+ * Check for errors the host controller driver might not have seen such as
+ * response mode errors or invalid card state.
+ */
+static bool mmc_blk_status_error(struct request *req, u32 status)
+{
+	struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
+	struct mmc_blk_request *brq = &mqrq->brq;
+	u32 stop_err_bits = mmc_blk_stop_err_bits(brq);
+
+	return brq->cmd.resp[0]  & CMD_ERRORS    ||
+	       brq->stop.resp[0] & stop_err_bits ||
+	       status            & stop_err_bits ||
+	       (rq_data_dir(req) == WRITE && !mmc_blk_in_tran_state(status));
+}
+
+static inline bool mmc_blk_cmd_started(struct mmc_blk_request *brq)
+{
+	return !brq->sbc.error && !brq->cmd.error &&
+	       !(brq->cmd.resp[0] & CMD_ERRORS);
+}
+
 static unsigned int mmc_blk_clock_khz(struct mmc_host *host)
 {
 	if (host->actual_clock)
@@ -2043,6 +2059,47 @@  static int mmc_blk_card_stuck(struct mmc_card *card, struct request *req,
 	return err;
 }
 
+static int mmc_blk_send_stop(struct mmc_card *card)
+{
+	struct mmc_command cmd = {
+		.opcode = MMC_STOP_TRANSMISSION,
+		.flags = MMC_RSP_SPI_R1 | MMC_RSP_R1 | MMC_CMD_AC,
+	};
+
+	return mmc_wait_for_cmd(card->host, &cmd, 5);
+}
+
+static int mmc_blk_fix_state(struct mmc_card *card, struct request *req)
+{
+	int err;
+
+	mmc_retune_hold_now(card->host);
+
+	mmc_blk_send_stop(card);
+
+	err = mmc_blk_card_stuck(card, req, NULL);
+
+	mmc_retune_release(card->host);
+
+	return err;
+}
+
+/*
+ * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
+ * policy:
+ * 1. A request that has transferred at least some data is considered
+ * successful and will be requeued if there is remaining data to
+ * transfer.
+ * 2. Otherwise the number of retries is incremented and the request
+ * will be requeued if there are remaining retries.
+ * 3. Otherwise the request will be errored out.
+ * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
+ * mqrq->retries. So there are only 4 possible actions here:
+ *	1. do not accept the bytes_xfered value i.e. set it to zero
+ *	2. change mqrq->retries to determine the number of retries
+ *	3. try to reset the card
+ *	4. read one sector at a time
+ */
 static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 {
 	int type = rq_data_dir(req) == READ ? MMC_BLK_READ : MMC_BLK_WRITE;
@@ -2050,131 +2107,83 @@  static void mmc_blk_rw_recovery(struct mmc_queue *mq, struct request *req)
 	struct mmc_blk_request *brq = &mqrq->brq;
 	struct mmc_blk_data *md = mq->blkdata;
 	struct mmc_card *card = mq->card;
-	static enum mmc_blk_status status;
-
-	brq->retune_retry_done = mqrq->retries;
+	u32 status;
+	u32 blocks;
+	int err;
 
-	status = __mmc_blk_err_check(card, mqrq);
+	/*
+	 * Some errors the host driver might not have seen. Set the number of
+	 * bytes transferred to zero in that case.
+	 */
+	err = __mmc_send_status(card, &status, 0);
+	if (err || mmc_blk_status_error(req, status))
+		brq->data.bytes_xfered = 0;
 
 	mmc_retune_release(card->host);
 
 	/*
-	 * Requests are completed by mmc_blk_mq_complete_rq() which sets simple
-	 * policy:
-	 * 1. A request that has transferred at least some data is considered
-	 * successful and will be requeued if there is remaining data to
-	 * transfer.
-	 * 2. Otherwise the number of retries is incremented and the request
-	 * will be requeued if there are remaining retries.
-	 * 3. Otherwise the request will be errored out.
-	 * That means mmc_blk_mq_complete_rq() is controlled by bytes_xfered and
-	 * mqrq->retries. So there are only 4 possible actions here:
-	 *	1. do not accept the bytes_xfered value i.e. set it to zero
-	 *	2. change mqrq->retries to determine the number of retries
-	 *	3. try to reset the card
-	 *	4. read one sector at a time
+	 * Try again to get the status. This also provides an opportunity for
+	 * re-tuning.
 	 */
-	switch (status) {
-	case MMC_BLK_SUCCESS:
-	case MMC_BLK_PARTIAL:
-		/* Reset success, and accept bytes_xfered */
-		mmc_blk_reset_success(md, type);
-		break;
-	case MMC_BLK_CMD_ERR:
-		/*
-		 * For SD cards, get bytes written, but do not accept
-		 * bytes_xfered if that fails. For MMC cards accept
-		 * bytes_xfered. Then try to reset. If reset fails then
-		 * error out the remaining request, otherwise retry
-		 * once (N.B mmc_blk_reset() will not succeed twice in a
-		 * row).
-		 */
-		if (mmc_card_sd(card)) {
-			u32 blocks;
-			int err;
+	if (err)
+		err = __mmc_send_status(card, &status, 0);
 
-			err = mmc_sd_num_wr_blocks(card, &blocks);
-			if (err)
-				brq->data.bytes_xfered = 0;
-			else
-				brq->data.bytes_xfered = blocks << 9;
-		}
-		if (mmc_blk_reset(md, card->host, type))
-			mqrq->retries = MMC_NO_RETRIES;
-		else
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-		break;
-	case MMC_BLK_RETRY:
-		/*
-		 * Do not accept bytes_xfered, but retry up to 5 times,
-		 * otherwise same as abort.
-		 */
-		brq->data.bytes_xfered = 0;
-		if (mqrq->retries < MMC_MAX_RETRIES)
-			break;
-		/* Fall through */
-	case MMC_BLK_ABORT:
-		/*
-		 * Do not accept bytes_xfered, but try to reset. If
-		 * reset succeeds, try once more, otherwise error out
-		 * the request.
-		 */
-		brq->data.bytes_xfered = 0;
-		if (mmc_blk_reset(md, card->host, type))
-			mqrq->retries = MMC_NO_RETRIES;
-		else
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-		break;
-	case MMC_BLK_DATA_ERR: {
-		int err;
+	/*
+	 * Nothing more to do after the number of bytes transferred has been
+	 * updated and there is no card.
+	 */
+	if (err && mmc_detect_card_removed(card->host))
+		return;
 
-		/*
-		 * Do not accept bytes_xfered, but try to reset. If
-		 * reset succeeds, try once more. If reset fails with
-		 * ENODEV which means the partition is wrong, then error
-		 * out the request. Otherwise attempt to read one sector
-		 * at a time.
-		 */
-		brq->data.bytes_xfered = 0;
-		err = mmc_blk_reset(md, card->host, type);
-		if (!err) {
-			mqrq->retries = MMC_MAX_RETRIES - 1;
-			break;
-		}
-		if (err == -ENODEV) {
-			mqrq->retries = MMC_NO_RETRIES;
-			break;
-		}
-		/* Fall through */
+	/* Try to get back to "tran" state */
+	if (err || !mmc_blk_in_tran_state(status))
+		err = mmc_blk_fix_state(mq->card, req);
+
+	/*
+	 * Special case for SD cards where the card might record the number of
+	 * blocks written.
+	 */
+	if (!err && mmc_blk_cmd_started(brq) && mmc_card_sd(card) &&
+	    rq_data_dir(req) == WRITE) {
+		if (mmc_sd_num_wr_blocks(card, &blocks))
+			brq->data.bytes_xfered = 0;
+		else
+			brq->data.bytes_xfered = blocks << 9;
 	}
-	case MMC_BLK_ECC_ERR:
-		/*
-		 * Do not accept bytes_xfered. If reading more than one
-		 * sector, try reading one sector at a time.
-		 */
-		brq->data.bytes_xfered = 0;
-		/* FIXME: Missing single sector read for large sector size */
-		if (brq->data.blocks > 1 && !mmc_large_sector(card)) {
-			/* Redo read one sector at a time */
-			pr_warn("%s: retrying using single block read\n",
-				req->rq_disk->disk_name);
-			mmc_blk_ss_read(mq, req);
-		} else {
-			mqrq->retries = MMC_NO_RETRIES;
-		}
-		break;
-	case MMC_BLK_NOMEDIUM:
-		/* Do not accept bytes_xfered. Error out the request */
-		brq->data.bytes_xfered = 0;
-		mqrq->retries = MMC_NO_RETRIES;
-		break;
-	default:
-		/* Do not accept bytes_xfered. Error out the request */
-		brq->data.bytes_xfered = 0;
+
+	/* Reset if the card is in a bad state */
+	if (err && mmc_blk_reset(md, card->host, type)) {
+		pr_err("%s: recovery failed!\n", req->rq_disk->disk_name);
 		mqrq->retries = MMC_NO_RETRIES;
-		pr_err("%s: Unhandled return value (%d)",
-		       req->rq_disk->disk_name, status);
-		break;
+		return;
+	}
+
+	/*
+	 * If anything was done, just return and if there is anything remaining
+	 * on the request it will get requeued.
+	 */
+	if (brq->data.bytes_xfered)
+		return;
+
+	/* Reset before last retry */
+	if (mqrq->retries + 1 == MMC_MAX_RETRIES)
+		mmc_blk_reset(md, card->host, type);
+
+	/* Command errors fail fast, so use all MMC_MAX_RETRIES */
+	if (brq->sbc.error || brq->cmd.error)
+		return;
+
+	/* Reduce the remaining retries for data errors */
+	if (mqrq->retries < MMC_MAX_RETRIES - MMC_DATA_RETRIES) {
+		mqrq->retries = MMC_MAX_RETRIES - MMC_DATA_RETRIES;
+		return;
+	}
+
+	/* FIXME: Missing single sector read for large sector size */
+	if (rq_data_dir(req) == READ && !mmc_large_sector(card)) {
+		/* Read one sector at a time */
+		mmc_blk_ss_read(mq, req);
+		return;
 	}
 }