diff mbox series

[v2] mmc: renesas_sdhi: Fix change point of data handling

Message ID 20240117110646.1317843-1-claudiu.beznea.uj@bp.renesas.com (mailing list archive)
State Superseded
Delegated to: Geert Uytterhoeven
Headers show
Series [v2] mmc: renesas_sdhi: Fix change point of data handling | expand

Commit Message

Claudiu Beznea Jan. 17, 2024, 11:06 a.m. UTC
From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

On latest kernel revisions it has been noticed (on a RZ/G3S system) that
when booting Linux and root file system is on eMMC, at some point in
the booting process, when the systemd applications are started, the
"mmc0: tuning execution failed: -5" message is displayed on console.
On kernel v6.7-rc5 this is reproducible in 90% of the boots. This was
missing on the same system with kernel v6.5.0-rc1. It was also noticed on
kernel revisions v6.6-rcX on a RZ/G2UL based system but not on the kernel
this fix is based on (v6.7-rc5).

Investigating it on RZ/G3S lead to the conclusion that every time the issue
is reproduced all the probed TAPs are OK. According to datasheet, when this
happens the change point of data need to be considered for tuning.

Previous code considered the change point of data happens when the content
of the SMPCMP register is zero. According to RZ/V2M hardware manual,
chapter "Change Point of the Input Data" (as this is the most clear
description that I've found about change point of the input data and all
RZ hardware manual are similar on this chapter), at the time of tuning,
data is captured by the previous and next TAPs and the result is stored in
the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
If there is a mismatch b/w the previous and the next TAPs, it indicates
that there is a change point of the input data.

To comply with this, the code checks if this mismatch is present and
updates the priv->smpcmp mask.

This change has been checked on the devices with the following DTSes by
doing 50 consecutive reboots and checking for the tuning failure message:
- r9a08g045s33-smarc.dts
- r8a7742-iwg21d-q7.dts
- r8a7743-iwg20d-q7.dts
- r8a7744-iwg20d-q7.dts
- r8a7745-iwg22d-sodimm.dts
- r8a77470-iwg23s-sbc.dts
- r8a774a1-hihope-rzg2m-ex.dts
- r8a774b1-hihope-rzg2n-ex.dts
- r8a774c0-ek874.dts
- r8a774e1-hihope-rzg2h-ex.dts
- r9a07g043u11-smarc-rzg2ul.dts
- r9a07g044c2-smarc-rzg2lc.dts
- r9a07g044l2-smarc-rzg2l.dts
- r9a07g054l2-smarc-rzv2l.dts

On r8a774a1-hihope-rzg2m-ex, even though the hardware manual doesn't say
anything special about it in the "Change Point of the Input Data" chapter
or SMPCMP register description, it has been noticed that although all TAPs
probed in the tuning process are OK the SMPCMP is zero. For this updated
the renesas_sdhi_select_tuning() function to use priv->taps in case all
TAPs are OK.

Fixes: 5fb6bf51f6d1 ("mmc: renesas_sdhi: improve TAP selection if all TAPs are good")
Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
---

Changes in v2:
- read the SH_MOBILE_SDHI_SCC_SMPCMP register only on success path of
  mmc_send_tuning()

 drivers/mmc/host/renesas_sdhi_core.c | 27 ++++++++++++++++++++++-----
 1 file changed, 22 insertions(+), 5 deletions(-)

Comments

Wolfram Sang Jan. 17, 2024, 2:06 p.m. UTC | #1
On Wed, Jan 17, 2024 at 01:06:46PM +0200, Claudiu wrote:
> From: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>
> 
> On latest kernel revisions it has been noticed (on a RZ/G3S system) that
> when booting Linux and root file system is on eMMC, at some point in
> the booting process, when the systemd applications are started, the
> "mmc0: tuning execution failed: -5" message is displayed on console.
> On kernel v6.7-rc5 this is reproducible in 90% of the boots. This was
> missing on the same system with kernel v6.5.0-rc1. It was also noticed on
> kernel revisions v6.6-rcX on a RZ/G2UL based system but not on the kernel
> this fix is based on (v6.7-rc5).
> 
> Investigating it on RZ/G3S lead to the conclusion that every time the issue
> is reproduced all the probed TAPs are OK. According to datasheet, when this
> happens the change point of data need to be considered for tuning.
> 
> Previous code considered the change point of data happens when the content
> of the SMPCMP register is zero. According to RZ/V2M hardware manual,
> chapter "Change Point of the Input Data" (as this is the most clear
> description that I've found about change point of the input data and all
> RZ hardware manual are similar on this chapter), at the time of tuning,
> data is captured by the previous and next TAPs and the result is stored in
> the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
> If there is a mismatch b/w the previous and the next TAPs, it indicates
> that there is a change point of the input data.
> 
> To comply with this, the code checks if this mismatch is present and
> updates the priv->smpcmp mask.
> 
> This change has been checked on the devices with the following DTSes by
> doing 50 consecutive reboots and checking for the tuning failure message:
> - r9a08g045s33-smarc.dts
> - r8a7742-iwg21d-q7.dts
> - r8a7743-iwg20d-q7.dts
> - r8a7744-iwg20d-q7.dts
> - r8a7745-iwg22d-sodimm.dts
> - r8a77470-iwg23s-sbc.dts
> - r8a774a1-hihope-rzg2m-ex.dts
> - r8a774b1-hihope-rzg2n-ex.dts
> - r8a774c0-ek874.dts
> - r8a774e1-hihope-rzg2h-ex.dts
> - r9a07g043u11-smarc-rzg2ul.dts
> - r9a07g044c2-smarc-rzg2lc.dts
> - r9a07g044l2-smarc-rzg2l.dts
> - r9a07g054l2-smarc-rzv2l.dts
> 
> On r8a774a1-hihope-rzg2m-ex, even though the hardware manual doesn't say
> anything special about it in the "Change Point of the Input Data" chapter
> or SMPCMP register description, it has been noticed that although all TAPs
> probed in the tuning process are OK the SMPCMP is zero. For this updated
> the renesas_sdhi_select_tuning() function to use priv->taps in case all
> TAPs are OK.
> 
> Fixes: 5fb6bf51f6d1 ("mmc: renesas_sdhi: improve TAP selection if all TAPs are good")
> Signed-off-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com>

Very interesting patch! Please give me a few days to review/test it.
Wolfram Sang Jan. 29, 2024, 10:43 a.m. UTC | #2
> Very interesting patch! Please give me a few days to review/test it.

I am still at it. I got some objections from Renesas and am trying to
figure out more details.
Wolfram Sang Jan. 29, 2024, 10:55 a.m. UTC | #3
Hi Claudiu,

but one thing I can ask already:

> Investigating it on RZ/G3S lead to the conclusion that every time the issue
> is reproduced all the probed TAPs are OK. According to datasheet, when this
> happens the change point of data need to be considered for tuning.

Yes, "considered" means here it should be *avoided*.

> Previous code considered the change point of data happens when the content
> of the SMPCMP register is zero. According to RZ/V2M hardware manual,

When SMPCMP is zero, there is *no* change point. Which is good.

> chapter "Change Point of the Input Data" (as this is the most clear
> description that I've found about change point of the input data and all
> RZ hardware manual are similar on this chapter),

I also have a chapter named like this. If you check the diagram, change
point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As
far away as possible from the change point.

> at the time of tuning,
> data is captured by the previous and next TAPs and the result is stored in
> the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
> If there is a mismatch b/w the previous and the next TAPs, it indicates
> that there is a change point of the input data.

This is correct.

> To comply with this, the code checks if this mismatch is present and
> updates the priv->smpcmp mask.

That means you select the "change point" instead of avoiding it?

> This change has been checked on the devices with the following DTSes by
> doing 50 consecutive reboots and checking for the tuning failure message:

Okay, you might not have a failure message, but you might have selected
the worst TAP. Or?

> +			if (cmpngu_data != cmpngd_data)
> +				set_bit(i, priv->smpcmp);

Really looks like you select the change point instead of avoiding it.

However, with some SD cards, I also see the EIO error you see. So, there
might be room to improve TAP selection when all TAPs are good. I need to
check if this is really is the same case for the SD cards in question.

Happy hacking,

   Wolfram
Claudiu Beznea Jan. 30, 2024, 7:03 a.m. UTC | #4
Hi, Wolfram,

On 29.01.2024 12:55, Wolfram Sang wrote:
> Hi Claudiu,
> 
> but one thing I can ask already:
> 
>> Investigating it on RZ/G3S lead to the conclusion that every time the issue
>> is reproduced all the probed TAPs are OK. According to datasheet, when this
>> happens the change point of data need to be considered for tuning.
> 
> Yes, "considered" means here it should be *avoided*.

My understanding was the other way around from this statement found in
RZ/G3S hw manual:

"If all of the TAP [i] is OK, the sampling clock position is selected by
identifying the change point of data.
 Change point of the data can be found in the value of SCC_SMPCMP register.
Usage example is Section 33.8.3, Change
 point of the input data."

> 
>> Previous code considered the change point of data happens when the content
>> of the SMPCMP register is zero. According to RZ/V2M hardware manual,
> 
> When SMPCMP is zero, there is *no* change point. Which is good.

That was my understanding, too.

> 
>> chapter "Change Point of the Input Data" (as this is the most clear
>> description that I've found about change point of the input data and all
>> RZ hardware manual are similar on this chapter),
> 
> I also have a chapter named like this. If you check the diagram, change
> point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As
> far away as possible from the change point.

My understanding was different here as of the following hw manual statement:

"As the width of the input data is 1 (UI), select TAP6 or TAP7 which is

*the median* of next TAP3 from TAP3"

I understand from this that the median value should be considered here.

> 
>> at the time of tuning,
>> data is captured by the previous and next TAPs and the result is stored in
>> the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0).
>> If there is a mismatch b/w the previous and the next TAPs, it indicates
>> that there is a change point of the input data.
> 
> This is correct.
> 
>> To comply with this, the code checks if this mismatch is present and
>> updates the priv->smpcmp mask.
> 
> That means you select the "change point" instead of avoiding it?
> 
>> This change has been checked on the devices with the following DTSes by
>> doing 50 consecutive reboots and checking for the tuning failure message:
> 
> Okay, you might not have a failure message, but you might have selected
> the worst TAP. Or?
> 
>> +			if (cmpngu_data != cmpngd_data)
>> +				set_bit(i, priv->smpcmp);
> 
> Really looks like you select the change point instead of avoiding it.

Looking again at it and digesting what you said about the tuning here, yes
it seems I did it this way.

> 
> However, with some SD cards, I also see the EIO error you see. So, there
> might be room to improve TAP selection when all TAPs are good. I need to
> check if this is really is the same case for the SD cards in question.

Maybe better would be to change this condition:

			if (cmpngu_data != cmpngd_data)
				set_bit(i, priv->smpcmp);

like this:
			if (cmpngu_data == cmpngd_data)
				set_bit(i, priv->smpcmp);

?

I need to check it, though.

Thanks for your input,
Claudiu Beznea

> 
> Happy hacking,
> 
>    Wolfram
>
Wolfram Sang Jan. 30, 2024, 7:26 a.m. UTC | #5
Hi Claudiu,

> My understanding was the other way around from this statement found in
> RZ/G3S hw manual:
> 
> "If all of the TAP [i] is OK, the sampling clock position is selected by
> identifying the change point of data.

Yes, it is easy to misunderstand. It should add "and avoid it" or
something. I got an internal diagram which makes it more clear. I just
asked if I can share it with you.

> > I also have a chapter named like this. If you check the diagram, change
> > point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As
> > far away as possible from the change point.
> 
> My understanding was different here as of the following hw manual statement:
> 
> "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is
> 
> *the median* of next TAP3 from TAP3"
> 
> I understand from this that the median value should be considered here.

Sorry, can't follow you here. "Select TAP6 or TAP7" is clear to me. But
it doesn't really matter why it was misleading...

> > However, with some SD cards, I also see the EIO error you see. So, there
> > might be room to improve TAP selection when all TAPs are good. I need to
> > check if this is really is the same case for the SD cards in question.
> 
> Maybe better would be to change this condition:
> 
> 			if (cmpngu_data != cmpngd_data)
> 				set_bit(i, priv->smpcmp);
> 
> like this:
> 			if (cmpngu_data == cmpngd_data)
> 				set_bit(i, priv->smpcmp);
> 
> ?
> 
> I need to check it, though.

But isn't it equal to the current code then? (Except for one thing: the
smpcmp bit is only set when there is no cmd error. I need to double
check but I think I like that.)

Happy hacking,

   Wolfram
Wolfram Sang Jan. 30, 2024, 7:31 a.m. UTC | #6
> But isn't it equal to the current code then? (Except for one thing: the
> smpcmp bit is only set when there is no cmd error. I need to double
> check but I think I like that.)

I double checked, I really like it. I'd just invert the logic. Pseudo
code:

if (!cmd_error)
	if (SMPCMP == 0) set_bit
else
	mmc_abort_tuning()
Claudiu Beznea Jan. 30, 2024, 7:51 a.m. UTC | #7
On 30.01.2024 09:26, Wolfram Sang wrote:
> Hi Claudiu,
> 
>> My understanding was the other way around from this statement found in
>> RZ/G3S hw manual:
>>
>> "If all of the TAP [i] is OK, the sampling clock position is selected by
>> identifying the change point of data.
> 
> Yes, it is easy to misunderstand. It should add "and avoid it" or
> something. I got an internal diagram which makes it more clear. I just
> asked if I can share it with you.
> 
>>> I also have a chapter named like this. If you check the diagram, change
>>> point is between TAP2 and 3, so the suggested TAP to use is 6 or 7. As
>>> far away as possible from the change point.
>>
>> My understanding was different here as of the following hw manual statement:
>>
>> "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is
>>
>> *the median* of next TAP3 from TAP3"
>>
>> I understand from this that the median value should be considered here.
> 
> Sorry, can't follow you here. "Select TAP6 or TAP7" is clear to me. But
> it doesn't really matter why it was misleading...
> 
>>> However, with some SD cards, I also see the EIO error you see. So, there
>>> might be room to improve TAP selection when all TAPs are good. I need to
>>> check if this is really is the same case for the SD cards in question.
>>
>> Maybe better would be to change this condition:
>>
>> 			if (cmpngu_data != cmpngd_data)
>> 				set_bit(i, priv->smpcmp);
>>
>> like this:
>> 			if (cmpngu_data == cmpngd_data)
>> 				set_bit(i, priv->smpcmp);
>>
>> ?
>>
>> I need to check it, though.
> 
> But isn't it equal to the current code then? (Except for one thing: the

From my debugging session I remember the SMPCMP was not zero and this lead
to my failure.

I'm not sure (and I don't remember from my debugging session) if CMPNGU and
CMPNGD are identical after the change point of the input data (CMPNGU  !=
CMPNGD) has been signaled by the controller. I need to check it.

> smpcmp bit is only set when there is no cmd error. I need to double
> check but I think I like that.)
> 
> Happy hacking,
> 
>    Wolfram
>
diff mbox series

Patch

diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c
index c675dec587ef..0090228a5e8f 100644
--- a/drivers/mmc/host/renesas_sdhi_core.c
+++ b/drivers/mmc/host/renesas_sdhi_core.c
@@ -18,6 +18,7 @@ 
  *
  */
 
+#include <linux/bitfield.h>
 #include <linux/clk.h>
 #include <linux/delay.h>
 #include <linux/iopoll.h>
@@ -312,6 +313,8 @@  static int renesas_sdhi_start_signal_voltage_switch(struct mmc_host *mmc,
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQDOWN	BIT(8)
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQUP	BIT(24)
 #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_ERR	(BIT(8) | BIT(24))
+#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA	GENMASK(23, 16)
+#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA	GENMASK(7, 0)
 
 #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL	BIT(4)
 #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400EN	BIT(31)
@@ -641,7 +644,14 @@  static int renesas_sdhi_select_tuning(struct tmio_mmc_host *host)
 	 * identifying the change point of data.
 	 */
 	if (bitmap_full(priv->taps, taps_size)) {
-		bitmap = priv->smpcmp;
+		/*
+		 * On some setups it happens that all TAPS are OK but
+		 * no change point of data. Any tap should be OK for this.
+		 */
+		if (bitmap_empty(priv->smpcmp, taps_size))
+			bitmap = priv->taps;
+		else
+			bitmap = priv->smpcmp;
 		min_tap_row = 1;
 	} else {
 		bitmap = priv->taps;
@@ -706,11 +716,18 @@  static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode)
 		if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0)
 			set_bit(i, priv->taps);
 
-		if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0)
-			set_bit(i, priv->smpcmp);
-
-		if (cmd_error)
+		if (cmd_error) {
 			mmc_send_abort_tuning(mmc, opcode);
+		} else {
+			u32 val, cmpngu_data, cmpngd_data;
+
+			val = sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP);
+			cmpngu_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA, val);
+			cmpngd_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA, val);
+
+			if (cmpngu_data != cmpngd_data)
+				set_bit(i, priv->smpcmp);
+		}
 	}
 
 	ret = renesas_sdhi_select_tuning(host);