diff mbox

[v2] mmc: dw_mmc: Make sure we don't get stuck when we get an error

Message ID 1400623670-2657-1-git-send-email-dianders@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Doug Anderson May 20, 2014, 10:07 p.m. UTC
If we happened to get a data error at just the wrong time the dw_mmc
driver could get into a state where it would never complete its
request.  That would leave the caller just hanging there.

We fix this two ways and both of the two fixes on their own appear to
fix the problems we've seen:

1. Fix a race in the tasklet where the interrupt setting the data
   error happens _just after_ we check for it, then we get a
   EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
2. Fix it so that if we detect that we've got an error in the "data
   busy" state and we're not going to do anything else we end the
   request and unblock anyone waiting.

Signed-off-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>
---
Changes in v2:
- Removed TODO
- Set cmd to NULL before calling dw_mci_request_end()

 drivers/mmc/host/dw_mmc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 46 insertions(+)

Comments

Seungwon Jeon May 21, 2014, 9:08 a.m. UTC | #1
On Wed, May 21, 2014, Doug Anderson wrote:
> If we happened to get a data error at just the wrong time the dw_mmc
> driver could get into a state where it would never complete its
> request.  That would leave the caller just hanging there.
> 
> We fix this two ways and both of the two fixes on their own appear to
> fix the problems we've seen:
> 
> 1. Fix a race in the tasklet where the interrupt setting the data
>    error happens _just after_ we check for it, then we get a
>    EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
> 2. Fix it so that if we detect that we've got an error in the "data
>    busy" state and we're not going to do anything else we end the
>    request and unblock anyone waiting.
> 
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>

It will be applied after "mmc: dw_mmc: change to use recommended reset procedure"

Acked-by: Seungwon Jeon <tgih.jun@samsung.com>

Thanks,
Seungwon Jeon

> ---
> Changes in v2:
> - Removed TODO
> - Set cmd to NULL before calling dw_mci_request_end()
> 
>  drivers/mmc/host/dw_mmc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 46 insertions(+)
> 
> diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> index cced599..54ec8b0 100644
> --- a/drivers/mmc/host/dw_mmc.c
> +++ b/drivers/mmc/host/dw_mmc.c
> @@ -1318,6 +1318,14 @@ static void dw_mci_tasklet_func(unsigned long priv)
>  			/* fall through */
> 
>  		case STATE_SENDING_DATA:
> +			/*
> +			 * We could get a data error and never a transfer
> +			 * complete so we'd better check for it here.
> +			 *
> +			 * Note that we don't really care if we also got a
> +			 * transfer complete; stopping the DMA and sending an
> +			 * abort won't hurt.
> +			 */
>  			if (test_and_clear_bit(EVENT_DATA_ERROR,
>  					       &host->pending_events)) {
>  				dw_mci_stop_dma(host);
> @@ -1331,7 +1339,29 @@ static void dw_mci_tasklet_func(unsigned long priv)
>  				break;
> 
>  			set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
> +
> +			/*
> +			 * Handle an EVENT_DATA_ERROR that might have shown up
> +			 * before the transfer completed.  This might not have
> +			 * been caught by the check above because the interrupt
> +			 * could have gone off between the previous check and
> +			 * the check for transfer complete.
> +			 *
> +			 * Technically this ought not be needed assuming we
> +			 * get a DATA_COMPLETE eventually (we'll notice the
> +			 * error and end the request), but it shouldn't hurt.
> +			 *
> +			 * This has the advantage of sending the stop command.
> +			 */
> +			if (test_and_clear_bit(EVENT_DATA_ERROR,
> +					       &host->pending_events)) {
> +				dw_mci_stop_dma(host);
> +				send_stop_abort(host, data);
> +				state = STATE_DATA_ERROR;
> +				break;
> +			}
>  			prev_state = state = STATE_DATA_BUSY;
> +
>  			/* fall through */
> 
>  		case STATE_DATA_BUSY:
> @@ -1354,6 +1384,22 @@ static void dw_mci_tasklet_func(unsigned long priv)
>  				/* stop command for open-ended transfer*/
>  				if (data->stop)
>  					send_stop_abort(host, data);
> +			} else {
> +				/*
> +				 * If we don't have a command complete now we'll
> +				 * never get one since we just reset everything;
> +				 * better end the request.
> +				 *
> +				 * If we do have a command complete we'll fall
> +				 * through to the SENDING_STOP command and
> +				 * everything will be peachy keen.
> +				 */
> +				if (!test_bit(EVENT_CMD_COMPLETE,
> +					      &host->pending_events)) {
> +					host->cmd = NULL;
> +					dw_mci_request_end(host, mrq);
> +					goto unlock;
> +				}
>  			}
> 
>  			/*
> --
> 1.9.1.423.g4596e3a
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Seungwon Jeon July 27, 2014, 2:15 p.m. UTC | #2
Hi Chris & Ulf,

I hope you find this patch for next.

Thanks,
Seungwon Jeon

On Wed, May 21, 2014, Seungwon Jeon wrote:
> On Wed, May 21, 2014, Doug Anderson wrote:
> > If we happened to get a data error at just the wrong time the dw_mmc
> > driver could get into a state where it would never complete its
> > request.  That would leave the caller just hanging there.
> >
> > We fix this two ways and both of the two fixes on their own appear to
> > fix the problems we've seen:
> >
> > 1. Fix a race in the tasklet where the interrupt setting the data
> >    error happens _just after_ we check for it, then we get a
> >    EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
> > 2. Fix it so that if we detect that we've got an error in the "data
> >    busy" state and we're not going to do anything else we end the
> >    request and unblock anyone waiting.
> >
> > Signed-off-by: Doug Anderson <dianders@chromium.org>
> > Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>
> 
> It will be applied after "mmc: dw_mmc: change to use recommended reset procedure"
> 
> Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
> 
> Thanks,
> Seungwon Jeon
> 
> > ---
> > Changes in v2:
> > - Removed TODO
> > - Set cmd to NULL before calling dw_mci_request_end()
> >
> >  drivers/mmc/host/dw_mmc.c | 46 ++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 46 insertions(+)
> >
> > diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
> > index cced599..54ec8b0 100644
> > --- a/drivers/mmc/host/dw_mmc.c
> > +++ b/drivers/mmc/host/dw_mmc.c
> > @@ -1318,6 +1318,14 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >  			/* fall through */
> >
> >  		case STATE_SENDING_DATA:
> > +			/*
> > +			 * We could get a data error and never a transfer
> > +			 * complete so we'd better check for it here.
> > +			 *
> > +			 * Note that we don't really care if we also got a
> > +			 * transfer complete; stopping the DMA and sending an
> > +			 * abort won't hurt.
> > +			 */
> >  			if (test_and_clear_bit(EVENT_DATA_ERROR,
> >  					       &host->pending_events)) {
> >  				dw_mci_stop_dma(host);
> > @@ -1331,7 +1339,29 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >  				break;
> >
> >  			set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
> > +
> > +			/*
> > +			 * Handle an EVENT_DATA_ERROR that might have shown up
> > +			 * before the transfer completed.  This might not have
> > +			 * been caught by the check above because the interrupt
> > +			 * could have gone off between the previous check and
> > +			 * the check for transfer complete.
> > +			 *
> > +			 * Technically this ought not be needed assuming we
> > +			 * get a DATA_COMPLETE eventually (we'll notice the
> > +			 * error and end the request), but it shouldn't hurt.
> > +			 *
> > +			 * This has the advantage of sending the stop command.
> > +			 */
> > +			if (test_and_clear_bit(EVENT_DATA_ERROR,
> > +					       &host->pending_events)) {
> > +				dw_mci_stop_dma(host);
> > +				send_stop_abort(host, data);
> > +				state = STATE_DATA_ERROR;
> > +				break;
> > +			}
> >  			prev_state = state = STATE_DATA_BUSY;
> > +
> >  			/* fall through */
> >
> >  		case STATE_DATA_BUSY:
> > @@ -1354,6 +1384,22 @@ static void dw_mci_tasklet_func(unsigned long priv)
> >  				/* stop command for open-ended transfer*/
> >  				if (data->stop)
> >  					send_stop_abort(host, data);
> > +			} else {
> > +				/*
> > +				 * If we don't have a command complete now we'll
> > +				 * never get one since we just reset everything;
> > +				 * better end the request.
> > +				 *
> > +				 * If we do have a command complete we'll fall
> > +				 * through to the SENDING_STOP command and
> > +				 * everything will be peachy keen.
> > +				 */
> > +				if (!test_bit(EVENT_CMD_COMPLETE,
> > +					      &host->pending_events)) {
> > +					host->cmd = NULL;
> > +					dw_mci_request_end(host, mrq);
> > +					goto unlock;
> > +				}
> >  			}
> >
> >  			/*
> > --
> > 1.9.1.423.g4596e3a
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson Aug. 13, 2014, 1:38 p.m. UTC | #3
Hi,

On Wed, May 21, 2014 at 2:08 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
> On Wed, May 21, 2014, Doug Anderson wrote:
>> If we happened to get a data error at just the wrong time the dw_mmc
>> driver could get into a state where it would never complete its
>> request.  That would leave the caller just hanging there.
>>
>> We fix this two ways and both of the two fixes on their own appear to
>> fix the problems we've seen:
>>
>> 1. Fix a race in the tasklet where the interrupt setting the data
>>    error happens _just after_ we check for it, then we get a
>>    EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
>> 2. Fix it so that if we detect that we've got an error in the "data
>>    busy" state and we're not going to do anything else we end the
>>    request and unblock anyone waiting.
>>
>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>> Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>
>
> It will be applied after "mmc: dw_mmc: change to use recommended reset procedure"
>
> Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
>
> Thanks,
> Seungwon Jeon

I saw that Ulf applied "mmc: dw_mmc: change to use recommended reset
procedure".  Could we apply this one now, too?  Do you want me to
repost?

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jaehoon Chung Aug. 13, 2014, 1:52 p.m. UTC | #4
On 08/13/2014 10:38 PM, Doug Anderson wrote:
> Hi,
> 
> On Wed, May 21, 2014 at 2:08 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
>> On Wed, May 21, 2014, Doug Anderson wrote:
>>> If we happened to get a data error at just the wrong time the dw_mmc
>>> driver could get into a state where it would never complete its
>>> request.  That would leave the caller just hanging there.
>>>
>>> We fix this two ways and both of the two fixes on their own appear to
>>> fix the problems we've seen:
>>>
>>> 1. Fix a race in the tasklet where the interrupt setting the data
>>>    error happens _just after_ we check for it, then we get a
>>>    EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
>>> 2. Fix it so that if we detect that we've got an error in the "data
>>>    busy" state and we're not going to do anything else we end the
>>>    request and unblock anyone waiting.
>>>
>>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>>> Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>
>>
>> It will be applied after "mmc: dw_mmc: change to use recommended reset procedure"
>>
>> Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
>>
>> Thanks,
>> Seungwon Jeon
> 
> I saw that Ulf applied "mmc: dw_mmc: change to use recommended reset
> procedure".  Could we apply this one now, too?  Do you want me to
> repost?
It's good that it will be merged with it.

Best Regards,
Jaehoon Chung

> 
> -Doug
> --
> To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ulf Hansson Aug. 13, 2014, 3:04 p.m. UTC | #5
On 13 August 2014 15:38, Doug Anderson <dianders@chromium.org> wrote:
> Hi,
>
> On Wed, May 21, 2014 at 2:08 AM, Seungwon Jeon <tgih.jun@samsung.com> wrote:
>> On Wed, May 21, 2014, Doug Anderson wrote:
>>> If we happened to get a data error at just the wrong time the dw_mmc
>>> driver could get into a state where it would never complete its
>>> request.  That would leave the caller just hanging there.
>>>
>>> We fix this two ways and both of the two fixes on their own appear to
>>> fix the problems we've seen:
>>>
>>> 1. Fix a race in the tasklet where the interrupt setting the data
>>>    error happens _just after_ we check for it, then we get a
>>>    EVENT_XFER_COMPLETE.  We fix this by repeating a bit of code.
>>> 2. Fix it so that if we detect that we've got an error in the "data
>>>    busy" state and we're not going to do anything else we end the
>>>    request and unblock anyone waiting.
>>>
>>> Signed-off-by: Doug Anderson <dianders@chromium.org>
>>> Signed-off-by: Yuvaraj Kumar C D <yuvaraj.cd@gmail.com>
>>
>> It will be applied after "mmc: dw_mmc: change to use recommended reset procedure"
>>
>> Acked-by: Seungwon Jeon <tgih.jun@samsung.com>
>>
>> Thanks,
>> Seungwon Jeon
>
> I saw that Ulf applied "mmc: dw_mmc: change to use recommended reset
> procedure".  Could we apply this one now, too?  Do you want me to
> repost?

Please repost and rebase if needed.

Kind regards
Uffe
>
> -Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-samsung-soc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/dw_mmc.c b/drivers/mmc/host/dw_mmc.c
index cced599..54ec8b0 100644
--- a/drivers/mmc/host/dw_mmc.c
+++ b/drivers/mmc/host/dw_mmc.c
@@ -1318,6 +1318,14 @@  static void dw_mci_tasklet_func(unsigned long priv)
 			/* fall through */
 
 		case STATE_SENDING_DATA:
+			/*
+			 * We could get a data error and never a transfer
+			 * complete so we'd better check for it here.
+			 *
+			 * Note that we don't really care if we also got a
+			 * transfer complete; stopping the DMA and sending an
+			 * abort won't hurt.
+			 */
 			if (test_and_clear_bit(EVENT_DATA_ERROR,
 					       &host->pending_events)) {
 				dw_mci_stop_dma(host);
@@ -1331,7 +1339,29 @@  static void dw_mci_tasklet_func(unsigned long priv)
 				break;
 
 			set_bit(EVENT_XFER_COMPLETE, &host->completed_events);
+
+			/*
+			 * Handle an EVENT_DATA_ERROR that might have shown up
+			 * before the transfer completed.  This might not have
+			 * been caught by the check above because the interrupt
+			 * could have gone off between the previous check and
+			 * the check for transfer complete.
+			 *
+			 * Technically this ought not be needed assuming we
+			 * get a DATA_COMPLETE eventually (we'll notice the
+			 * error and end the request), but it shouldn't hurt.
+			 *
+			 * This has the advantage of sending the stop command.
+			 */
+			if (test_and_clear_bit(EVENT_DATA_ERROR,
+					       &host->pending_events)) {
+				dw_mci_stop_dma(host);
+				send_stop_abort(host, data);
+				state = STATE_DATA_ERROR;
+				break;
+			}
 			prev_state = state = STATE_DATA_BUSY;
+
 			/* fall through */
 
 		case STATE_DATA_BUSY:
@@ -1354,6 +1384,22 @@  static void dw_mci_tasklet_func(unsigned long priv)
 				/* stop command for open-ended transfer*/
 				if (data->stop)
 					send_stop_abort(host, data);
+			} else {
+				/*
+				 * If we don't have a command complete now we'll
+				 * never get one since we just reset everything;
+				 * better end the request.
+				 *
+				 * If we do have a command complete we'll fall
+				 * through to the SENDING_STOP command and
+				 * everything will be peachy keen.
+				 */
+				if (!test_bit(EVENT_CMD_COMPLETE,
+					      &host->pending_events)) {
+					host->cmd = NULL;
+					dw_mci_request_end(host, mrq);
+					goto unlock;
+				}
 			}
 
 			/*