Message ID | 20240205112702.213050-1-claudiu.beznea.uj@bp.renesas.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mmc: renesas_sdhi: Fix change point of data handling | expand |
Hi Claudiu, thanks for the updated version! > To comply with this, the patch checks if this mismatch is present and > updates the priv->smpcmp mask only if it is not. Previous code checked if > the value of SMPCMP register was zero. However, on RZ/G3S, this leads to > failues as it may happen, e.g., the following: > CMPNGU=0x0e, CMPNGD=0x0e, SMPCMP=0x000e000e. Can you add the current TAP number (variable 'i') to this printout? According to my understanding, we should only mark this TAP good if it is in the range 5-7. I need to double check with Renesas, though. > Along with it, as mmc_send_tuning() may return with error even before the > MMC command reach the controller (and because at that point cmd_error = 0), > the update of priv->smpcmp mask has been done only if the return value of > mmc_send_tuning(mmc, opcode, &cmd_error) is 0 (success). This is a needed change, for sure. > This change has been checked on the devices with the following DTSes by > doing 100 consecutive boots and checking for the tuning failure message: Boot failure is one test. Read/write tests should be another, I think. Because if we select a bad TAP, bad things might happen later. To reduce the amount of testing, read/write testing could only be triggered if the new code path was excecuted? Happy hacking, Wolfram
Hi, Wolfram, On 05.02.2024 15:07, Wolfram Sang wrote: > Hi Claudiu, > > thanks for the updated version! > >> To comply with this, the patch checks if this mismatch is present and >> updates the priv->smpcmp mask only if it is not. Previous code checked if >> the value of SMPCMP register was zero. However, on RZ/G3S, this leads to >> failues as it may happen, e.g., the following: >> CMPNGU=0x0e, CMPNGD=0x0e, SMPCMP=0x000e000e. > > Can you add the current TAP number (variable 'i') to this printout? This is a snapshot I have saved from my previous debugging session (but I tried here to check the values of cmpngd, cmpngu): i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002 i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000 i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002 i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000 i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 This is printed in this for loop: https://elixir.bootlin.com/linux/latest/source/drivers/mmc/host/renesas_sdhi_core.c#L700 > According to my understanding, we should only mark this TAP good if it > is in the range 5-7. I need to double check with Renesas, though. OK, my understanding is that it should be in the middle (beginning being the tap that triggered change point of the input data, end being the next tap with the same ID). This is what I understand from this: "As the width of the input data is 1 (UI), select TAP6 or TAP7 which is *the median* of next TAP3 from TAP3." > >> Along with it, as mmc_send_tuning() may return with error even before the >> MMC command reach the controller (and because at that point cmd_error = 0), >> the update of priv->smpcmp mask has been done only if the return value of >> mmc_send_tuning(mmc, opcode, &cmd_error) is 0 (success). > > This is a needed change, for sure. > >> This change has been checked on the devices with the following DTSes by >> doing 100 consecutive boots and checking for the tuning failure message: > > Boot failure is one test. Read/write tests should be another, I think. OK, I'll try also read/write. Do you have in mind something particular? > Because if we select a bad TAP, bad things might happen later. To reduce > the amount of testing, read/write testing could only be triggered if the > new code path was excecuted? I'm not sure how to trigger that (or maybe I haven't understood your statement...) Thank you, Claudiu Beznea > > Happy hacking, > > Wolfram >
> > According to my understanding, we should only mark this TAP good if it > > is in the range 5-7. I need to double check with Renesas, though. > > OK, my understanding is that it should be in the middle (beginning being > the tap that triggered change point of the input data, end being the next > tap with the same ID). This is what I understand from this: "As the width > of the input data is 1 (UI), select TAP6 or TAP7 which is > > *the median* of next TAP3 from TAP3." Yes, I agree. With 0x0e, that means TAP1+2+3 are changing points and we should be far away from them, like 5-7. But: I am still waiting for Renesas to answer my questions regarding SMPCMP. I'd like to get that first, so we have clear facts then. > > Boot failure is one test. Read/write tests should be another, I think. > > OK, I'll try also read/write. Do you have in mind something particular? Nope. Just consistency checks. > > Because if we select a bad TAP, bad things might happen later. To reduce > > the amount of testing, read/write testing could only be triggered if the > > new code path was excecuted? > > I'm not sure how to trigger that (or maybe I haven't understood your > statement...) I thought something in the lines of: - print out when you needed SMPCMP to select a TAP - check the log for that printout - if (printout) do read_write_tests Dunno if that makes sense with your test setup.
Hi, Wolfram, On 05.02.2024 16:51, Wolfram Sang wrote: > >>> According to my understanding, we should only mark this TAP good if it >>> is in the range 5-7. I need to double check with Renesas, though. >> >> OK, my understanding is that it should be in the middle (beginning being >> the tap that triggered change point of the input data, end being the next >> tap with the same ID). This is what I understand from this: "As the width >> of the input data is 1 (UI), select TAP6 or TAP7 which is >> >> *the median* of next TAP3 from TAP3." > > Yes, I agree. With 0x0e, that means TAP1+2+3 are changing points and we > should be far away from them, like 5-7. As of my understanding the TAP where cmpngu = 0x0e and cmpngd=0x0e is not considered change point of the input data. For that to happen it would mean that cmpngu != cmpngd. From this snapshot, datasheet and our discussions: i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 *i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* *i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* *i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 *i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* *i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* *i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 I understand that TAP4,5,6 are change point of the input data and TAP8,0,1,2,3 are candidates for being selected, TAP 1,2 being the best (please correct me if I'm wrong). > > But: I am still waiting for Renesas to answer my questions regarding > SMPCMP. I'd like to get that first, so we have clear facts then. > >>> Boot failure is one test. Read/write tests should be another, I think. >> >> OK, I'll try also read/write. Do you have in mind something particular? > > Nope. Just consistency checks. > >>> Because if we select a bad TAP, bad things might happen later. To reduce >>> the amount of testing, read/write testing could only be triggered if the >>> new code path was excecuted? >> >> I'm not sure how to trigger that (or maybe I haven't understood your >> statement...) > > I thought something in the lines of: > > - print out when you needed SMPCMP to select a TAP On my device (RZ/G3S) that triggered initially "mmc0: tuning execution failed" at probe, with this patch (when doing read/write tests) I have a lot of moment when cmpngu == cmpngd and thus the smpcmp bitmask is populated. With RZ/G3S+rootfs on eMMC and this patch I did the following read/write test: root@smarc-rzg3s:~# dd if=/dev/random of=out bs=1024 count=1048576 1048576+0 records in 1048576+0 records out root@smarc-rzg3s:~# root@smarc-rzg3s:~# dd if=out of=test bs=1024 count=1048576 1048576+0 records in 1048576+0 records out root@smarc-rzg3s:~# root@smarc-rzg3s:~# root@smarc-rzg3s:~# root@smarc-rzg3s:~# md5sum out test b053723af63801e665959d48cb7bd8e6 out b053723af63801e665959d48cb7bd8e6 test Do yo consider this enough? Thank you, Claudiu Beznea > - check the log for that printout > - if (printout) do read_write_tests > > Dunno if that makes sense with your test setup. >
Hi Claudiu, I got more information about SMPCMP now. I had a misunderstanding there. According to your patch description, you might have the same misunderstanding? Let me quote again: === RZ hardware manual are similar on this chapter), at the time of tuning, data is captured by the previous and next TAPs and the result is stored in the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0). === It is not the previous and next TAP but the previous and next clock cycle using the *same* TAP. And the bits in the register describe if there was a mismatch in the data bits across these clock cycles. So, we really want SMPCMP to be 0 because the data should be stable across all three clock cycles of the same TAP. > As of my understanding the TAP where cmpngu = 0x0e and cmpngd=0x0e is not > considered change point of the input data. For that to happen it would mean > that cmpngu != cmpngd. I am not sure you can assume that cmpngu != cmpngd is always true for a change point. I'd think it is likely often the case. But always? I am not convinced. But I am convinced that if SMPCMP is 0, this is a good TAP because it was stable over these clock cycles. > From this snapshot, datasheet and our discussions: > > i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e > i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > *i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* > *i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* > *i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* > i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > *i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* > *i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* > *i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* > i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 > > I understand that TAP4,5,6 are change point of the input data and > TAP8,0,1,2,3 are candidates for being selected, TAP 1,2 being the best > (please correct me if I'm wrong). I agree that TAP4-6 are the change point. TAP2 could be a candidate. I dunno why SMPCMP is non-zero at i == 2, maybe some glitch due to noise on the board? I do really wonder why probing failed, though? TAP1 sounds like a good choice as well. I mean we consider SMPCMP only if all TAPs are good. So, if probing fails, that means that SMPCMP was non-zero all the time? That being said, our code to select the best TAP from SMPCMP is really not considering the change point :( It just picks the first one where SMPCMP is 0. We are not checking where the change point is and try to be as far away as possible. > root@smarc-rzg3s:~# md5sum out test > b053723af63801e665959d48cb7bd8e6 out > b053723af63801e665959d48cb7bd8e6 test > > Do yo consider this enough? Yes, if done 100 times ;) I hope this mail was helpful? Thanks and happy hacking, Wolfram
Hi, Wolfram, On 08.02.2024 02:56, Wolfram Sang wrote: > Hi Claudiu, > > I got more information about SMPCMP now. I had a misunderstanding there. > According to your patch description, you might have the same > misunderstanding? Let me quote again: > > === > RZ hardware manual are similar on this chapter), at the time of tuning, > data is captured by the previous and next TAPs and the result is stored in > the SMPCMP register (previous TAP in bits 22..16, next TAP in bits 7..0). > === > > It is not the previous and next TAP but the previous and next clock > cycle using the *same* TAP. And the bits in the register describe if > there was a mismatch in the data bits across these clock cycles. That's something new for me, it's not described in HW manual (or at least I haven't found it). > > So, we really want SMPCMP to be 0 because the data should be stable > across all three clock cycles of the same TAP. So, it means issues should be somewhere else on my setup. > >> As of my understanding the TAP where cmpngu = 0x0e and cmpngd=0x0e is not >> considered change point of the input data. For that to happen it would mean >> that cmpngu != cmpngd. > > I am not sure you can assume that cmpngu != cmpngd is always true for a > change point. I'd think it is likely often the case. But always? I am > not convinced. That's was my understanding from HW manual and since it fixed my issue I considered it valid at the point I wrote this statement. Maybe we need to understand this? > But I am convinced that if SMPCMP is 0, this is a good > TAP because it was stable over these clock cycles. > >> From this snapshot, datasheet and our discussions: >> >> i=0, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=1, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=2, cmpngu=0000000e, cmpngd=0000000e, smpcmp=000e000e >> i=3, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> *i=4, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* >> *i=5, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* >> *i=6, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* >> i=7, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=8, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=9, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=10, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> i=11, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> *i=12, cmpngu=00000000, cmpngd=00000002, smpcmp=00000002* >> *i=13, cmpngu=00000000, cmpngd=000000ff, smpcmp=000001ff* >> *i=14, cmpngu=000000ff, cmpngd=00000000, smpcmp=01ff0000* >> i=15, cmpngu=00000000, cmpngd=00000000, smpcmp=00000000 >> >> I understand that TAP4,5,6 are change point of the input data and >> TAP8,0,1,2,3 are candidates for being selected, TAP 1,2 being the best >> (please correct me if I'm wrong). > > I agree that TAP4-6 are the change point. TAP2 could be a candidate. I > dunno why SMPCMP is non-zero at i == 2, maybe some glitch due to noise > on the board? Hm... it worth considering it... > > I do really wonder why probing failed, though? TAP1 sounds like a good > choice as well. I mean we consider SMPCMP only if all TAPs are good. So, > if probing fails, that means that SMPCMP was non-zero all the time? Yes, that was my finding as well on my setup which leads to this patch. If we're taking as example the snapshot I dropped here in a previous email, and do not consider this patch, code at [1] should clear bit for TAP2 in smpcmp mask because in the 1st round SMPCMP was not zero (but 0x000e000e) and in the 2nd round it was zero. [1] https://elixir.bootlin.com/linux/latest/source/drivers/mmc/host/renesas_sdhi_core.c#L629 > > That being said, our code to select the best TAP from SMPCMP is really > not considering the change point :( It just picks the first one where > SMPCMP is 0. Hm... I thought code at [2] selects the TAP in the middle (in the snapshot I pointed, TAP1). [1] https://elixir.bootlin.com/linux/latest/source/drivers/mmc/host/renesas_sdhi_core.c#L656 > We are not checking where the change point is and try to be > as far away as possible. > >> root@smarc-rzg3s:~# md5sum out test >> b053723af63801e665959d48cb7bd8e6 out >> b053723af63801e665959d48cb7bd8e6 test >> >> Do yo consider this enough? > > Yes, if done 100 times ;) This may take a while... > > I hope this mail was helpful? The tuning procedure it's better understand now. But I'm not sure in which direction should I dig further... :) Thank you for details and patience, Claudiu Beznea > > Thanks and happy hacking, > > Wolfram >
diff --git a/drivers/mmc/host/renesas_sdhi_core.c b/drivers/mmc/host/renesas_sdhi_core.c index c675dec587ef..8871521e1274 100644 --- a/drivers/mmc/host/renesas_sdhi_core.c +++ b/drivers/mmc/host/renesas_sdhi_core.c @@ -18,6 +18,7 @@ * */ +#include <linux/bitfield.h> #include <linux/clk.h> #include <linux/delay.h> #include <linux/iopoll.h> @@ -312,6 +313,8 @@ static int renesas_sdhi_start_signal_voltage_switch(struct mmc_host *mmc, #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQDOWN BIT(8) #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_REQUP BIT(24) #define SH_MOBILE_SDHI_SCC_SMPCMP_CMD_ERR (BIT(8) | BIT(24)) +#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA GENMASK(23, 16) +#define SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA GENMASK(7, 0) #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400OSEL BIT(4) #define SH_MOBILE_SDHI_SCC_TMPPORT2_HS400EN BIT(31) @@ -703,11 +706,18 @@ static int renesas_sdhi_execute_tuning(struct mmc_host *mmc, u32 opcode) /* Set sampling clock position */ sd_scc_write32(host, priv, SH_MOBILE_SDHI_SCC_TAPSET, i % priv->tap_num); - if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0) - set_bit(i, priv->taps); + if (mmc_send_tuning(mmc, opcode, &cmd_error) == 0) { + u32 val, cmpngu_data, cmpngd_data; + + val = sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP); + cmpngu_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGU_DATA, val); + cmpngd_data = FIELD_GET(SH_MOBILE_SDHI_SCC_SMPCMP_CMPNGD_DATA, val); + + if (cmpngu_data == cmpngd_data) + set_bit(i, priv->smpcmp); - if (sd_scc_read32(host, priv, SH_MOBILE_SDHI_SCC_SMPCMP) == 0) - set_bit(i, priv->smpcmp); + set_bit(i, priv->taps); + } if (cmd_error) mmc_send_abort_tuning(mmc, opcode);