diff mbox series

mmc-utils: Add basic erase error check

Message ID 16c9b85406034bd6b3c526070b9fd995@hyperstone.com (mailing list archive)
State New, archived
Headers show
Series mmc-utils: Add basic erase error check | expand

Commit Message

Christian Loehle Dec. 12, 2022, 9:21 a.m. UTC
Check for erase specific R1 errors so e.g. an OOR erase is not
reported as successful when it never executed.

There could be checks for more error bits but R1_ERASE_SEQ_ERROR
on CMD38 should catch all that are reported by hardware anyway.

Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
---
 mmc_cmds.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Avri Altman Dec. 12, 2022, 1:44 p.m. UTC | #1
> Check for erase specific R1 errors so e.g. an OOR erase is not
> reported as successful when it never executed.
> 
> There could be checks for more error bits but R1_ERASE_SEQ_ERROR
> on CMD38 should catch all that are reported by hardware anyway.
> 
> Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
> ---
>  mmc_cmds.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
> 
> diff --git a/mmc_cmds.c b/mmc_cmds.c
> index e6d3273..c00fe5e 100644
> --- a/mmc_cmds.c
> +++ b/mmc_cmds.c
> @@ -54,6 +54,7 @@
>  #define WPTYPE_PWRON 2
>  #define WPTYPE_PERM 3
> 
> +#define ERASE_R1_ERR_MASK (R1_ERASE_SEQ_ERROR | R1_ERASE_PARAM |
> R1_ERASE_RESET)
> 
>  int read_extcsd(int fd, __u8 *ext_csd)
>  {
> @@ -2668,6 +2669,23 @@ static int erase(int dev_fd, __u32 argin, __u32 start,
> __u32 end)
>         if (ret)
>                 perror("Erase multi-cmd ioctl");
> 
> +       /* Does not work for SPI cards */
> +       if (multi_cmd->cmds[0].response[0] & ERASE_R1_ERR_MASK) {
> +               fprintf(stderr, "Erase start response: %08x\n",
> +                               multi_cmd->cmds[0].response[0]);
> +               ret = -EIO;
> +       }
> +       if (multi_cmd->cmds[1].response[0] & ERASE_R1_ERR_MASK) {
> +               fprintf(stderr, "Erase end response: %08x\n",
> +                               multi_cmd->cmds[1].response[0]);
> +               ret = -EIO;
> +       }
> +       if (multi_cmd->cmds[2].response[0] & ERASE_R1_ERR_MASK) {
> +               fprintf(stderr, "Erase response: %08x\n",
> +                               multi_cmd->cmds[2].response[0]);
> +               ret = -EIO;
> +       }
> +
AFAIK the device will not set those bits in the command responses,
but those are only available to read in the status register.
Let me check.

Thanks,
Avri

>         free(multi_cmd);
>         return ret;
>  }
> --
> 2.37.3
> 
> 
> Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
> Managing Director: Dr. Jan Peter Berns.
> Commercial register of local courts: Freiburg HRB381782
Christian Loehle Dec. 12, 2022, 2 p.m. UTC | #2
-----Original Message-----
From: Avri Altman <Avri.Altman@wdc.com> 
Sent: Montag, 12. Dezember 2022 14:45
To: Christian Löhle <CLoehle@hyperstone.com>; ulf.hansson@linaro.org; adrian.hunter@intel.com; linux-mmc@vger.kernel.org
Subject: RE: [PATCH] mmc-utils: Add basic erase error check

>> Check for erase specific R1 errors so e.g. an OOR erase is not 
>> reported as successful when it never executed.
>> 
>> There could be checks for more error bits but R1_ERASE_SEQ_ERROR on 
>> CMD38 should catch all that are reported by hardware anyway.
>> 
>> Signed-off-by: Christian Loehle <cloehle@hyperstone.com>
>> ---
>>  mmc_cmds.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>> 
>> diff --git a/mmc_cmds.c b/mmc_cmds.c
>> index e6d3273..c00fe5e 100644
>> --- a/mmc_cmds.c
>> +++ b/mmc_cmds.c
>> @@ -54,6 +54,7 @@
>>  #define WPTYPE_PWRON 2
>>  #define WPTYPE_PERM 3
>> 
>> +#define ERASE_R1_ERR_MASK (R1_ERASE_SEQ_ERROR | R1_ERASE_PARAM |
>> R1_ERASE_RESET)
>> 
>>  int read_extcsd(int fd, __u8 *ext_csd)  { @@ -2668,6 +2669,23 @@ 
>> static int erase(int dev_fd, __u32 argin, __u32 start,
>> __u32 end)
>>         if (ret)
>>                 perror("Erase multi-cmd ioctl");
>> 
>> +       /* Does not work for SPI cards */
>> +       if (multi_cmd->cmds[0].response[0] & ERASE_R1_ERR_MASK) {
>> +               fprintf(stderr, "Erase start response: %08x\n",
>> +                               multi_cmd->cmds[0].response[0]);
>> +               ret = -EIO;
>> +       }
>> +       if (multi_cmd->cmds[1].response[0] & ERASE_R1_ERR_MASK) {
>> +               fprintf(stderr, "Erase end response: %08x\n",
>> +                               multi_cmd->cmds[1].response[0]);
>> +               ret = -EIO;
>> +       }
>> +       if (multi_cmd->cmds[2].response[0] & ERASE_R1_ERR_MASK) {
>> +               fprintf(stderr, "Erase response: %08x\n",
>> +                               multi_cmd->cmds[2].response[0]);
>> +               ret = -EIO;
>> +       }
>> +
> AFAIK the device will not set those bits in the command responses, but those are only available to read in the status register.
> Let me check.
>
> Thanks,
> Avri

So any R1 response which all of the three are? Or am I misunderstanding you?

Anyway the (eMMC) spec reads:
"If an erase command (either CMD35, CMD36, CMD38) is received out of the defined erase sequence, the Device shall set the ERASE_SEQ_ERROR bit in the status register and reset the whole sequence. If the host provides an out of range address as an argument to CMD35 or CMD36, the Device will reject the command, respond with the ADDRESS_OUT_OF_RANGE bit set and reset the whole erase sequence."
And for the cards that I've tried this holds true.
For SD a CMD13 after CMD38 is required, too.
I guess I can add that.

Regards,
Christian

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
Christian Loehle Dec. 12, 2022, 2:08 p.m. UTC | #3
> For SD a CMD13 after CMD38 is required, too.
> I guess I can add that.

Just realized that sending CMD13 is not sufficient as the kernel will poll because of R1B and clear the error flag.
Anyway I have this kernel patch for a write flag bit that aggregates errors during polling until card is in TRAN again.
I will send it then, since I don't think there is a good way of solving this for SD in mmc-utils, please consider this patch on its own.

Regards,
Christian

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
Avri Altman Dec. 13, 2022, 2 p.m. UTC | #4
> > For SD a CMD13 after CMD38 is required, too.
> > I guess I can add that.
> 
> Just realized that sending CMD13 is not sufficient as the kernel will poll
> because of R1B and clear the error flag.
> Anyway I have this kernel patch for a write flag bit that aggregates errors
> during polling until card is in TRAN again.
> I will send it then, since I don't think there is a good way of solving this for SD in
> mmc-utils, please consider this patch on its own.
Leaving SD aside for now - I Still wasn't able to get an expert opinion - holiday season etc. 
While waiting however, looking in Table 70 - Device Status field/command - cross reference, I can see that :
- ERASE_RESET - is not reported for any of cmd35, cmd36, and cmd38
- ERASE_PARAM - is 'X' for cmd35 only
- ERASE_SEQ_ ERROR - is 'R' for cmd35 and cmd36

So potentially only ERASE_SEQ_ ERROR may reside in those commands responses.
But mmc-utils uses multi-ioctl for that, so there couldn't be any mis-ordering?
Which error bits you see in which command responses?

Thanks,
Avri
> 
> Regards,
> Christian
> 
> Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz Managing Director:
> Dr. Jan Peter Berns.
> Commercial register of local courts: Freiburg HRB381782
Christian Loehle Dec. 13, 2022, 3:26 p.m. UTC | #5
>> > For SD a CMD13 after CMD38 is required, too.
>> > I guess I can add that.
>> 
>> Just realized that sending CMD13 is not sufficient as the kernel will 
>> poll because of R1B and clear the error flag.
>> Anyway I have this kernel patch for a write flag bit that aggregates 
>> errors during polling until card is in TRAN again.
>> I will send it then, since I don't think there is a good way of 
>> solving this for SD in mmc-utils, please consider this patch on its own.
> Leaving SD aside for now - I Still wasn't able to get an expert opinion - holiday season etc. 
> While waiting however, looking in Table 70 - Device Status field/command - cross reference, I can see that :
> - ERASE_RESET - is not reported for any of cmd35, cmd36, and cmd38
> - ERASE_PARAM - is 'X' for cmd35 only
> - ERASE_SEQ_ ERROR - is 'R' for cmd35 and cmd36
>
> So potentially only ERASE_SEQ_ ERROR may reside in those commands responses.
> But mmc-utils uses multi-ioctl for that, so there couldn't be any mis-ordering?
> Which error bits you see in which command responses?
> 
> Thanks,
> Avri

Hey Avri,
Thanks for having a look at this.
Yeah I have indeed only seen ERASE_SEQ_ERROR, but added the rest because it doesn't hurt.
Why would you say ERASE_PARAM shouldn't be checked? Or are you arguing it should only be checked at CMD38 (i.e. the X of CMD35)?
(In which case I agree, just didn't want more than one error mask)
Seeing a ERASE_PARAM would definitely make the erase not happen (that's all mmc-utils should really care about).
ERASE_RESET may be removed for sure, could be checked when a SD ioctl erase fix like I described is in the kernel.
Then an ideal mmc-utils erase operation checks that no relevant R1 bits are set at CMD38 RSP and all CMD13 until card leaves PRG and stops signalling busy.


For an erase call like this:
mmc-utils erase legacy 0 0xFFFFFFFF /dev/mmcblk2

the MMC trace (according to spec and most cards I tried, this is one of them) looks like this:
Format: timestamp,type,CMDOPCODE,
for type 1 (CMD) and type 2 (Resp48) then the next field is arg/response in hex.
I guess rest is more or less self-explanatory / not relevant.

325533.758261654,1,13,00010000
325533.758282029,2,13,00000900, READY_FOR_DATA, TRANS_STATE
325534.549850485,1,08,00000000
325534.549874693,2,08,00000900, READY_FOR_DATA, TRANS_STATE
325534.550132818,4,08,0,512
325534.550132818,5,08,00000000000000000000000... REDACTED FOR SIZE
325534.550693693,1,35,00000000
325534.550710026,2,35,00000900, READY_FOR_DATA, TRANS_STATE
325534.550761651,1,36,ffffffff
325534.550774485,2,36,80000900, ADDRESS_OUT_OF_RANGE, READY_FOR_DATA, TRANS_STATE
325534.552136276,1,38,00000000
325534.552164568,2,38,10000900, ERASE_SEQ_ERROR, READY_FOR_DATA, TRANS_STATE
325534.552227568,1,13,00010000
325534.552241276,2,13,00000900, READY_FOR_DATA, TRANS_STATE

Hope that helps.

Regards,
Christian

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
Avri Altman Dec. 14, 2022, 8:15 a.m. UTC | #6
> -----Original Message-----
> From: Christian Löhle <CLoehle@hyperstone.com>
> Sent: Tuesday, December 13, 2022 5:26 PM
> To: Avri Altman <Avri.Altman@wdc.com>
> Cc: linux-mmc@vger.kernel.org; ulf.hansson@linaro.org
> Subject: RE: [PATCH] mmc-utils: Add basic erase error check
> 
> CAUTION: This email originated from outside of Western Digital. Do not click
> on links or open attachments unless you recognize the sender and know that
> the content is safe.
> 
> 
> >> > For SD a CMD13 after CMD38 is required, too.
> >> > I guess I can add that.
> >>
> >> Just realized that sending CMD13 is not sufficient as the kernel will
> >> poll because of R1B and clear the error flag.
> >> Anyway I have this kernel patch for a write flag bit that aggregates
> >> errors during polling until card is in TRAN again.
> >> I will send it then, since I don't think there is a good way of
> >> solving this for SD in mmc-utils, please consider this patch on its own.
> > Leaving SD aside for now - I Still wasn't able to get an expert opinion -
> holiday season etc.
Got it now.
Although table 70 seems simple and definitive - Apparently I misinterpret it.
Sorry about that.
 
> > While waiting however, looking in Table 70 - Device Status field/command -
> cross reference, I can see that :
> > - ERASE_RESET - is not reported for any of cmd35, cmd36, and cmd38
> > - ERASE_PARAM - is 'X' for cmd35 only
> > - ERASE_SEQ_ ERROR - is 'R' for cmd35 and cmd36
> >
> > So potentially only ERASE_SEQ_ ERROR may reside in those commands
> responses.
> > But mmc-utils uses multi-ioctl for that, so there couldn't be any mis-
> ordering?
> > Which error bits you see in which command responses?
> >
> > Thanks,
> > Avri
> 
> Hey Avri,
> Thanks for having a look at this.
> Yeah I have indeed only seen ERASE_SEQ_ERROR, but added the rest
> because it doesn't hurt.
ERASE_SEQ_ERROR - It is set when e.g. CMD36 is given before CMD35
e.g. the following CMDs are issued CMD36 -> CMD35 -> CMD38 
In the response of CMD36, ERASE_SEQ_ERROR error bit will be reported.
This can't be happening for mmc-utils because it send the correct sequence using multi-ioctl - 
Which assure it can't be interrupted by any other command.
The trace that you attached looks like a fw bug to me.

> Why would you say ERASE_PARAM shouldn't be checked? Or are you arguing
> it should only be checked at CMD38 (i.e. the X of CMD35)?
ERASE_PARAM - It is set when wrong address is given for first erase group in CMD35, 
And is reported in the response of CMD36.
This is actually the only error bit we should look for.

> (In which case I agree, just didn't want more than one error mask)
> Seeing a ERASE_PARAM would definitely make the erase not happen (that's
> all mmc-utils should really care about).
> ERASE_RESET may be removed for sure, could be checked when a SD ioctl
> erase fix like I described is in the kernel.
ERASE_RESET is set when any other commands apart from CMD35/36/38/13 is sent during the erase sequence.
This as aforesaid, can't be happening for mmc-utils.

> Then an ideal mmc-utils erase operation checks that no relevant R1 bits are
> set at CMD38 RSP and all CMD13 until card leaves PRG and stops signalling
> busy.
Are you suggesting that we should include cmd13 as the 4th command in the erase sequence?
I am not sure we need it.

To conclude, IMO only ERASE_PARAM should be checked in the response of CMD36.

Thanks,
Avri
> 
> 
> For an erase call like this:
> mmc-utils erase legacy 0 0xFFFFFFFF /dev/mmcblk2
> 
> the MMC trace (according to spec and most cards I tried, this is one of them)
> looks like this:
> Format: timestamp,type,CMDOPCODE,
> for type 1 (CMD) and type 2 (Resp48) then the next field is arg/response in
> hex.
> I guess rest is more or less self-explanatory / not relevant.
> 
> 325533.758261654,1,13,00010000
> 325533.758282029,2,13,00000900, READY_FOR_DATA, TRANS_STATE
> 325534.549850485,1,08,00000000
> 325534.549874693,2,08,00000900, READY_FOR_DATA, TRANS_STATE
> 325534.550132818,4,08,0,512
> 325534.550132818,5,08,00000000000000000000000... REDACTED FOR SIZE
> 325534.550693693,1,35,00000000
> 325534.550710026,2,35,00000900, READY_FOR_DATA, TRANS_STATE
> 325534.550761651,1,36,ffffffff
> 325534.550774485,2,36,80000900, ADDRESS_OUT_OF_RANGE,
> READY_FOR_DATA, TRANS_STATE
> 325534.552136276,1,38,00000000
> 325534.552164568,2,38,10000900, ERASE_SEQ_ERROR, READY_FOR_DATA,
> TRANS_STATE
> 325534.552227568,1,13,00010000
> 325534.552241276,2,13,00000900, READY_FOR_DATA, TRANS_STATE
> 
> Hope that helps.
> 
> Regards,
> Christian
> 
> Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
> Managing Director: Dr. Jan Peter Berns.
> Commercial register of local courts: Freiburg HRB381782
Christian Loehle Dec. 15, 2022, 6:14 p.m. UTC | #7
Hey Avri,

>> Yeah I have indeed only seen ERASE_SEQ_ERROR, but added the rest 
>> because it doesn't hurt.
> ERASE_SEQ_ERROR - It is set when e.g. CMD36 is given before CMD35 e.g. the following CMDs are issued CMD36 -> CMD35 -> CMD38 In the response of CMD36, ERASE_SEQ_ERROR error bit will be reported.
> This can't be happening for mmc-utils because it send the correct sequence using multi-ioctl - Which assure it can't be interrupted by any other command.
> The trace that you attached looks like a fw bug to me.

I disagree, requoting the eMMC spec I quoted originally, considering an OOR erase address:
'If the host provides an out of range address as an argument to CMD35 or CMD36, the Device will reject the command, respond with the ADDRESS_OUT_OF_RANGE bit set and reset the whole erase sequence."
So CMD38 must trigger ERASE_SEQ_ERROR for OOR on MMC and it is a (HW) bug if it doesn't.

>
>> Why would you say ERASE_PARAM shouldn't be checked? Or are you arguing 
>> it should only be checked at CMD38 (i.e. the X of CMD35)?
> ERASE_PARAM - It is set when wrong address is given for first erase group in CMD35, And is reported in the response of CMD36.
> This is actually the only error bit we should look for.
>
>> (In which case I agree, just didn't want more than one error mask) 
>> Seeing a ERASE_PARAM would definitely make the erase not happen 
>> (that's all mmc-utils should really care about).
>> ERASE_RESET may be removed for sure, could be checked when a SD ioctl 
>> erase fix like I described is in the kernel.
> ERASE_RESET is set when any other commands apart from CMD35/36/38/13 is sent during the erase sequence.
> This as aforesaid, can't be happening for mmc-utils.

I disagree here, too.
Just because we are not sending it doesn't mean the card does not see it.
The card may sample a CMD start bit at some time during the sequence, will not answer because of ILLEGAL / CRC mismatch.
The sequence is reset anyway (out of sequence command received). This is not unthinkable on neither SD nor MMC bus.
ERASE_SEQ_ERROR at CMD38 will catch that, too, though.

>
>> Then an ideal mmc-utils erase operation checks that no relevant R1 
>> bits are set at CMD38 RSP and all CMD13 until card leaves PRG and 
>> stops signalling busy.
> Are you suggesting that we should include cmd13 as the 4th command in the erase sequence?
> I am not sure we need it.

I am.
At any point there is some operation making changes to the flash, be it writes or erases, the busy will be asserted (assuming cache off).
When busy is deasserted host needs to check why this was the case.
Consider a flash voltage drop, card may signal e.g. the general error bits, but behavior depends on the card of course.
If issuing a secure erase, I would like to know if busy of CMD38 was deasserted because of successful completion or some other error and erase was in fact not fully executed.
mmc-utils cannot fix this on its own, so for now let's restrict this patch to OOR erases and so on for MMC-only.

> To conclude, IMO only ERASE_PARAM should be checked in the response of CMD36.
I think ERASE_PARAM should be checked for CMD36, ERASE_SEQ_ERROR for CMD38, I'm fine with removing ERASE_RESET check for the patch, it is caught at CMD38 ERASE_SEQ_ERROR anyway.
What do you say?

Regards,
Christian

Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
Managing Director: Dr. Jan Peter Berns.
Commercial register of local courts: Freiburg HRB381782
Avri Altman Dec. 16, 2022, 6:11 a.m. UTC | #8
> 
> Hey Avri,
> 
> >> Yeah I have indeed only seen ERASE_SEQ_ERROR, but added the rest
> >> because it doesn't hurt.
> > ERASE_SEQ_ERROR - It is set when e.g. CMD36 is given before CMD35 e.g.
> the following CMDs are issued CMD36 -> CMD35 -> CMD38 In the response
> of CMD36, ERASE_SEQ_ERROR error bit will be reported.
> > This can't be happening for mmc-utils because it send the correct sequence
> using multi-ioctl - Which assure it can't be interrupted by any other
> command.
> > The trace that you attached looks like a fw bug to me.
> 
> I disagree, requoting the eMMC spec I quoted originally, considering an OOR
> erase address:
> 'If the host provides an out of range address as an argument to CMD35 or
> CMD36, the Device will reject the command, respond with the
> ADDRESS_OUT_OF_RANGE bit set and reset the whole erase sequence."
> So CMD38 must trigger ERASE_SEQ_ERROR for OOR on MMC and it is a (HW)
> bug if it doesn't.
It makes sense but let me confirm first and get back to you on that.

> 
> >
> >> Why would you say ERASE_PARAM shouldn't be checked? Or are you
> arguing
> >> it should only be checked at CMD38 (i.e. the X of CMD35)?
> > ERASE_PARAM - It is set when wrong address is given for first erase group
> in CMD35, And is reported in the response of CMD36.
> > This is actually the only error bit we should look for.
> >
> >> (In which case I agree, just didn't want more than one error mask)
> >> Seeing a ERASE_PARAM would definitely make the erase not happen
> >> (that's all mmc-utils should really care about).
> >> ERASE_RESET may be removed for sure, could be checked when a SD ioctl
> >> erase fix like I described is in the kernel.
> > ERASE_RESET is set when any other commands apart from
> CMD35/36/38/13 is sent during the erase sequence.
> > This as aforesaid, can't be happening for mmc-utils.
> 
> I disagree here, too.
> Just because we are not sending it doesn't mean the card does not see it.
> The card may sample a CMD start bit at some time during the sequence, will
> not answer because of ILLEGAL / CRC mismatch.
> The sequence is reset anyway (out of sequence command received). This is
> not unthinkable on neither SD nor MMC bus.
Vanishingly remotely as there are no cmd nor data error, but ok.
 
> ERASE_SEQ_ERROR at CMD38 will catch that, too, though.
> 
> >
> >> Then an ideal mmc-utils erase operation checks that no relevant R1
> >> bits are set at CMD38 RSP and all CMD13 until card leaves PRG and
> >> stops signalling busy.
> > Are you suggesting that we should include cmd13 as the 4th command in
> the erase sequence?
> > I am not sure we need it.
> 
> I am.
> At any point there is some operation making changes to the flash, be it
> writes or erases, the busy will be asserted (assuming cache off).
> When busy is deasserted host needs to check why this was the case.
> Consider a flash voltage drop, card may signal e.g. the general error bits, but
> behavior depends on the card of course.
> If issuing a secure erase, I would like to know if busy of CMD38 was
> deasserted because of successful completion or some other error and erase
> was in fact not fully executed.
> mmc-utils cannot fix this on its own, so for now let's restrict this patch to
> OOR erases and so on for MMC-only.
OK.  Please try to reuse the send-status code we already have.

> 
> > To conclude, IMO only ERASE_PARAM should be checked in the response
> of CMD36.
> I think ERASE_PARAM should be checked for CMD36, ERASE_SEQ_ERROR for
> CMD38, I'm fine with removing ERASE_RESET check for the patch, it is caught
> at CMD38 ERASE_SEQ_ERROR anyway.
> What do you say?
Please let me confirm first about ERASE_SEQ_ERROR in cmd38.

Thanks,
Avri

> 
> Regards,
> Christian
> 
> Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz
> Managing Director: Dr. Jan Peter Berns.
> Commercial register of local courts: Freiburg HRB381782
Avri Altman Jan. 2, 2023, 12:01 p.m. UTC | #9
> > Hey Avri,
> >
> > >> Yeah I have indeed only seen ERASE_SEQ_ERROR, but added the rest
> > >> because it doesn't hurt.
> > > ERASE_SEQ_ERROR - It is set when e.g. CMD36 is given before CMD35 e.g.
> > the following CMDs are issued CMD36 -> CMD35 -> CMD38 In the response
> > of CMD36, ERASE_SEQ_ERROR error bit will be reported.
> > > This can't be happening for mmc-utils because it send the correct
> > > sequence
> > using multi-ioctl - Which assure it can't be interrupted by any other
> > command.
> > > The trace that you attached looks like a fw bug to me.
> >
> > I disagree, requoting the eMMC spec I quoted originally, considering
> > an OOR erase address:
> > 'If the host provides an out of range address as an argument to CMD35
> > or CMD36, the Device will reject the command, respond with the
> > ADDRESS_OUT_OF_RANGE bit set and reset the whole erase sequence."
> > So CMD38 must trigger ERASE_SEQ_ERROR for OOR on MMC and it is a
> (HW)
> > bug if it doesn't.
> It makes sense but let me confirm first and get back to you on that.
> 
> >
> > >
> > >> Why would you say ERASE_PARAM shouldn't be checked? Or are you
> > arguing
> > >> it should only be checked at CMD38 (i.e. the X of CMD35)?
> > > ERASE_PARAM - It is set when wrong address is given for first erase
> > > group
> > in CMD35, And is reported in the response of CMD36.
> > > This is actually the only error bit we should look for.
> > >
> > >> (In which case I agree, just didn't want more than one error mask)
> > >> Seeing a ERASE_PARAM would definitely make the erase not happen
> > >> (that's all mmc-utils should really care about).
> > >> ERASE_RESET may be removed for sure, could be checked when a SD
> > >> ioctl erase fix like I described is in the kernel.
> > > ERASE_RESET is set when any other commands apart from
> > CMD35/36/38/13 is sent during the erase sequence.
> > > This as aforesaid, can't be happening for mmc-utils.
> >
> > I disagree here, too.
> > Just because we are not sending it doesn't mean the card does not see it.
> > The card may sample a CMD start bit at some time during the sequence,
> > will not answer because of ILLEGAL / CRC mismatch.
> > The sequence is reset anyway (out of sequence command received). This
> > is not unthinkable on neither SD nor MMC bus.
> Vanishingly remotely as there are no cmd nor data error, but ok.
> 
> > ERASE_SEQ_ERROR at CMD38 will catch that, too, though.
> >
> > >
> > >> Then an ideal mmc-utils erase operation checks that no relevant R1
> > >> bits are set at CMD38 RSP and all CMD13 until card leaves PRG and
> > >> stops signalling busy.
> > > Are you suggesting that we should include cmd13 as the 4th command
> > > in
> > the erase sequence?
> > > I am not sure we need it.
> >
> > I am.
> > At any point there is some operation making changes to the flash, be
> > it writes or erases, the busy will be asserted (assuming cache off).
> > When busy is deasserted host needs to check why this was the case.
> > Consider a flash voltage drop, card may signal e.g. the general error
> > bits, but behavior depends on the card of course.
> > If issuing a secure erase, I would like to know if busy of CMD38 was
> > deasserted because of successful completion or some other error and
> > erase was in fact not fully executed.
> > mmc-utils cannot fix this on its own, so for now let's restrict this
> > patch to OOR erases and so on for MMC-only.
> OK.  Please try to reuse the send-status code we already have.
> 
> >
> > > To conclude, IMO only ERASE_PARAM should be checked in the response
> > of CMD36.
> > I think ERASE_PARAM should be checked for CMD36, ERASE_SEQ_ERROR for
> > CMD38, I'm fine with removing ERASE_RESET check for the patch, it is
> > caught at CMD38 ERASE_SEQ_ERROR anyway.
> > What do you say?
OK - Lets continue with our current understandings.

Btw, you might want to consider support for SD as well (in a different commit) - 
SD doesn't support cmd35 & cmd36, but cmd32 & cmd33 instead,
and its response nuances are defined in the SD spec.

Thanks,
Avri

> Please let me confirm first about ERASE_SEQ_ERROR in cmd38.
> 
> Thanks,
> Avri
> 
> >
> > Regards,
> > Christian
> >
> > Hyperstone GmbH | Reichenaustr. 39a  | 78467 Konstanz Managing
> > Director: Dr. Jan Peter Berns.
> > Commercial register of local courts: Freiburg HRB381782
diff mbox series

Patch

diff --git a/mmc_cmds.c b/mmc_cmds.c
index e6d3273..c00fe5e 100644
--- a/mmc_cmds.c
+++ b/mmc_cmds.c
@@ -54,6 +54,7 @@ 
 #define WPTYPE_PWRON 2
 #define WPTYPE_PERM 3
 
+#define ERASE_R1_ERR_MASK (R1_ERASE_SEQ_ERROR | R1_ERASE_PARAM | R1_ERASE_RESET)
 
 int read_extcsd(int fd, __u8 *ext_csd)
 {
@@ -2668,6 +2669,23 @@  static int erase(int dev_fd, __u32 argin, __u32 start, __u32 end)
 	if (ret)
 		perror("Erase multi-cmd ioctl");
 
+	/* Does not work for SPI cards */
+	if (multi_cmd->cmds[0].response[0] & ERASE_R1_ERR_MASK) {
+		fprintf(stderr, "Erase start response: %08x\n",
+				multi_cmd->cmds[0].response[0]);
+		ret = -EIO;
+	}
+	if (multi_cmd->cmds[1].response[0] & ERASE_R1_ERR_MASK) {
+		fprintf(stderr, "Erase end response: %08x\n",
+				multi_cmd->cmds[1].response[0]);
+		ret = -EIO;
+	}
+	if (multi_cmd->cmds[2].response[0] & ERASE_R1_ERR_MASK) {
+		fprintf(stderr, "Erase response: %08x\n",
+				multi_cmd->cmds[2].response[0]);
+		ret = -EIO;
+	}
+
 	free(multi_cmd);
 	return ret;
 }