Message ID | 1492592434-81312-3-git-send-email-shawn.lin@rock-chips.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi, On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote: > Currently we unconditionally do tuning for each degree, which > costs 900ms for each boot and resume. > > May someone argue that this is a question of accuracy VS time. But I > would say it's a trick of how we need to do decision for our boards. > If we don't care the time we spend at all, we could definitely do tuning > for each degree. But when we need to improve the user experience, for > instance, speed up resuming from S3, we should also have the right to > do that. This patch add parsing "rockchip,default-num-phases", for folks > to specify the number of doing tuning. If not specified, 360 will be used > as before. > > Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> > > --- > > drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++-------------- > 1 file changed, 30 insertions(+), 18 deletions(-) No huge objection here, but I do remember we ended up at the 360 phases due to some of the craziness with dw_mmc delay elements on rk3288. IIRC one of the big problems was that the delay elements changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at runtime for DDR DVFS. That imposed an extra need to be very accurate on that SoC, at least on any board that was designed to support DDR DVFS. I also remember there being some weirdness on the Rockchip implementation where there was a certain set of phases that the MMC controller was essentially "blind". This blind spot was in the middle of an otherwise good range of points. Unfortunately this blind spot was somewhat hard to detect properly because it was not very big. ...the variability of the delay elements meant that there could be big ranges where we weren't getting any good test coverage, but also the fact that they changed with the LOGIC voltage might mean that we weren't in the "blind" spot and then suddenly we were. One other note is that i remember that the vast majority of time spent tuning was dealing with "bad" phases, not dealing with "good" phases. If you're looking to speed things up, maybe finding a way to make "bad" phases fail faster would be wise? I think one of the reasons bad phases failed so slowly is because the dw_mmc timeouts are all so long. Oh, and I guess one last note is that I have no idea if folks will like the device bindings here. Part of me thinks it should be something more "symbolic" like "rockchip,need-accurate-tuning" or something like that. I guess I'd let the DT experts chime in. So I guess to summarize: * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360 seems to provide real benefit. * On other boards, probably you can get by with fewer phases. -Doug
Hi Doug, 在 2017/4/20 4:19, Doug Anderson 写道: > Hi, > > On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >> Currently we unconditionally do tuning for each degree, which >> costs 900ms for each boot and resume. >> >> May someone argue that this is a question of accuracy VS time. But I >> would say it's a trick of how we need to do decision for our boards. >> If we don't care the time we spend at all, we could definitely do tuning >> for each degree. But when we need to improve the user experience, for >> instance, speed up resuming from S3, we should also have the right to >> do that. This patch add parsing "rockchip,default-num-phases", for folks >> to specify the number of doing tuning. If not specified, 360 will be used >> as before. >> >> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> >> >> --- >> >> drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++-------------- >> 1 file changed, 30 insertions(+), 18 deletions(-) > > No huge objection here, but I do remember we ended up at the 360 > phases due to some of the craziness with dw_mmc delay elements on > rk3288. IIRC one of the big problems was that the delay elements > changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at > runtime for DDR DVFS. That imposed an extra need to be very accurate > on that SoC, at least on any board that was designed to support DDR > DVFS. > Not just with the vdd_logic but also with the process of Soc. To better guaratee the accuracy, firstly we use delay element to do tuning and then convert it to be combination of degree + delay element. But as the dalay elements aren't accuracy themself, so all the math we do here is trick. > I also remember there being some weirdness on the Rockchip > implementation where there was a certain set of phases that the MMC > controller was essentially "blind". This blind spot was in the middle > of an otherwise good range of points. Unfortunately this blind spot > was somewhat hard to detect properly because it was not very big. > ...the variability of the delay elements meant that there could be big > ranges where we weren't getting any good test coverage, but also the > fact that they changed with the LOGIC voltage might mean that we > weren't in the "blind" spot and then suddenly we were. I undertand all of these as I was suffering from it when bringing up RK3288. > > One other note is that i remember that the vast majority of time spent > tuning was dealing with "bad" phases, not dealing with "good" phases. > If you're looking to speed things up, maybe finding a way to make > "bad" phases fail faster would be wise? I think one of the reasons > bad phases failed so slowly is because the dw_mmc timeouts are all so > long. Good point. I haven't thought of speeding up the handle of bad phases, but I will take a look at this. > > Oh, and I guess one last note is that I have no idea if folks will > like the device bindings here. Part of me thinks it should be > something more "symbolic" like "rockchip,need-accurate-tuning" or > something like that. I guess I'd let the DT experts chime in. > > > So I guess to summarize: > * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360 > seems to provide real benefit. > * On other boards, probably you can get by with fewer phases. > I would try to say it's a question of "900ms for a single time" VS. "some of discrete tuning cost for more chance to do retune". (1)We could try to do a more accurate tuning process and spends 900ms. Then we have less chance to do retune later. (2)We do a rough tuning and have more chance to do retune later. I also would say that this is a game , and we can't say which one is better. Obvious now the "900ms" alwyas happens in the spot routine, for instance, booting and resuming from S3. > > -Doug > > >
Hi, On Wed, Apr 19, 2017 at 6:21 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: > Hi Doug, > > 在 2017/4/20 4:19, Doug Anderson 写道: >> >> Hi, >> >> On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> >> wrote: >>> >>> Currently we unconditionally do tuning for each degree, which >>> costs 900ms for each boot and resume. >>> >>> May someone argue that this is a question of accuracy VS time. But I >>> would say it's a trick of how we need to do decision for our boards. >>> If we don't care the time we spend at all, we could definitely do tuning >>> for each degree. But when we need to improve the user experience, for >>> instance, speed up resuming from S3, we should also have the right to >>> do that. This patch add parsing "rockchip,default-num-phases", for folks >>> to specify the number of doing tuning. If not specified, 360 will be used >>> as before. >>> >>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> >>> >>> --- >>> >>> drivers/mmc/host/dw_mmc-rockchip.c | 48 >>> ++++++++++++++++++++++++-------------- >>> 1 file changed, 30 insertions(+), 18 deletions(-) >> >> >> No huge objection here, but I do remember we ended up at the 360 >> phases due to some of the craziness with dw_mmc delay elements on >> rk3288. IIRC one of the big problems was that the delay elements >> changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at >> runtime for DDR DVFS. That imposed an extra need to be very accurate >> on that SoC, at least on any board that was designed to support DDR >> DVFS. >> > > Not just with the vdd_logic but also with the process of Soc. > To better guaratee the accuracy, firstly we use delay element to do > tuning and then convert it to be combination of degree + delay element. > But as the dalay elements aren't accuracy themself, so all the math we > do here is trick. Yup. I brought up the vdd logic specifically because it's something that can make the phases change quite dramatically on the same machine between the time you tuned and the time you used it. >> I also remember there being some weirdness on the Rockchip >> implementation where there was a certain set of phases that the MMC >> controller was essentially "blind". This blind spot was in the middle >> of an otherwise good range of points. Unfortunately this blind spot >> was somewhat hard to detect properly because it was not very big. >> ...the variability of the delay elements meant that there could be big >> ranges where we weren't getting any good test coverage, but also the >> fact that they changed with the LOGIC voltage might mean that we >> weren't in the "blind" spot and then suddenly we were. > > > I undertand all of these as I was suffering from it when bringing up > RK3288. > >> >> One other note is that i remember that the vast majority of time spent >> tuning was dealing with "bad" phases, not dealing with "good" phases. >> If you're looking to speed things up, maybe finding a way to make >> "bad" phases fail faster would be wise? I think one of the reasons >> bad phases failed so slowly is because the dw_mmc timeouts are all so >> long. > > > Good point. I haven't thought of speeding up the handle of bad phases, > but I will take a look at this. > >> >> Oh, and I guess one last note is that I have no idea if folks will >> like the device bindings here. Part of me thinks it should be >> something more "symbolic" like "rockchip,need-accurate-tuning" or >> something like that. I guess I'd let the DT experts chime in. >> >> >> So I guess to summarize: >> * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360 >> seems to provide real benefit. >> * On other boards, probably you can get by with fewer phases. >> > > I would try to say it's a question of "900ms for a single time" VS. > "some of discrete tuning cost for more chance to do retune". > > (1)We could try to do a more accurate tuning process and spends 900ms. > Then we have less chance to do retune later. > > (2)We do a rough tuning and have more chance to do retune later. Ah, interesting point. I haven't used newer versions of Linux much, but I seem to remember that they will automatically retune sometimes if they see errors. That makes your strategy a bit more valid. > I also would say that this is a game , and we can't say which > one is better. Obvious now the "900ms" alwyas happens in the spot > routine, for instance, booting and resuming from S3. Is it really 900 ms? I don't quite remember it being that long, but I could be remembering incorrectly. -Doug
Hi Doug, 在 2017/4/25 0:18, Doug Anderson 写道: > Hi, > > On Wed, Apr 19, 2017 at 6:21 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote: >> Hi Doug, >> >> 在 2017/4/20 4:19, Doug Anderson 写道: >>> >>> Hi, >>> >>> On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> >>> wrote: >>>> >>>> Currently we unconditionally do tuning for each degree, which >>>> costs 900ms for each boot and resume. >>>> >>>> May someone argue that this is a question of accuracy VS time. But I >>>> would say it's a trick of how we need to do decision for our boards. >>>> If we don't care the time we spend at all, we could definitely do tuning >>>> for each degree. But when we need to improve the user experience, for >>>> instance, speed up resuming from S3, we should also have the right to >>>> do that. This patch add parsing "rockchip,default-num-phases", for folks >>>> to specify the number of doing tuning. If not specified, 360 will be used >>>> as before. >>>> >>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> >>>> >>>> --- >>>> >>>> drivers/mmc/host/dw_mmc-rockchip.c | 48 >>>> ++++++++++++++++++++++++-------------- >>>> 1 file changed, 30 insertions(+), 18 deletions(-) >>> >>> >>> No huge objection here, but I do remember we ended up at the 360 >>> phases due to some of the craziness with dw_mmc delay elements on >>> rk3288. IIRC one of the big problems was that the delay elements >>> changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at >>> runtime for DDR DVFS. That imposed an extra need to be very accurate >>> on that SoC, at least on any board that was designed to support DDR >>> DVFS. >>> >> >> Not just with the vdd_logic but also with the process of Soc. >> To better guaratee the accuracy, firstly we use delay element to do >> tuning and then convert it to be combination of degree + delay element. >> But as the dalay elements aren't accuracy themself, so all the math we >> do here is trick. > > Yup. I brought up the vdd logic specifically because it's something > that can make the phases change quite dramatically on the same machine > between the time you tuned and the time you used it. > > >>> I also remember there being some weirdness on the Rockchip >>> implementation where there was a certain set of phases that the MMC >>> controller was essentially "blind". This blind spot was in the middle >>> of an otherwise good range of points. Unfortunately this blind spot >>> was somewhat hard to detect properly because it was not very big. >>> ...the variability of the delay elements meant that there could be big >>> ranges where we weren't getting any good test coverage, but also the >>> fact that they changed with the LOGIC voltage might mean that we >>> weren't in the "blind" spot and then suddenly we were. >> >> >> I undertand all of these as I was suffering from it when bringing up >> RK3288. >> >>> >>> One other note is that i remember that the vast majority of time spent >>> tuning was dealing with "bad" phases, not dealing with "good" phases. >>> If you're looking to speed things up, maybe finding a way to make >>> "bad" phases fail faster would be wise? I think one of the reasons >>> bad phases failed so slowly is because the dw_mmc timeouts are all so >>> long. >> >> >> Good point. I haven't thought of speeding up the handle of bad phases, >> but I will take a look at this. >> >>> >>> Oh, and I guess one last note is that I have no idea if folks will >>> like the device bindings here. Part of me thinks it should be >>> something more "symbolic" like "rockchip,need-accurate-tuning" or >>> something like that. I guess I'd let the DT experts chime in. >>> >>> >>> So I guess to summarize: >>> * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360 >>> seems to provide real benefit. >>> * On other boards, probably you can get by with fewer phases. >>> >> >> I would try to say it's a question of "900ms for a single time" VS. >> "some of discrete tuning cost for more chance to do retune". >> >> (1)We could try to do a more accurate tuning process and spends 900ms. >> Then we have less chance to do retune later. >> >> (2)We do a rough tuning and have more chance to do retune later. > > Ah, interesting point. I haven't used newer versions of Linux much, > but I seem to remember that they will automatically retune sometimes > if they see errors. That makes your strategy a bit more valid. > > >> I also would say that this is a game , and we can't say which >> one is better. Obvious now the "900ms" alwyas happens in the spot >> routine, for instance, booting and resuming from S3. > > Is it really 900 ms? I don't quite remember it being that long, but I > could be remembering incorrectly. I saw the worst case was nearly 900ms. But mostly we need 600ms there. > > -Doug > > >
diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c index 372fb6e..c535526 100644 --- a/drivers/mmc/host/dw_mmc-rockchip.c +++ b/drivers/mmc/host/dw_mmc-rockchip.c @@ -25,6 +25,7 @@ struct dw_mci_rockchip_priv_data { struct clk *drv_clk; struct clk *sample_clk; int default_sample_phase; + int num_phases; }; static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios) @@ -133,8 +134,8 @@ static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios) } } -#define NUM_PHASES 360 -#define TUNING_ITERATION_TO_PHASE(i) (DIV_ROUND_UP((i) * 360, NUM_PHASES)) +#define TUNING_ITERATION_TO_PHASE(i, num_phases) \ + (DIV_ROUND_UP((i) * 360, num_phases)) static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) { @@ -159,13 +160,15 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) return -EIO; } - ranges = kmalloc_array(NUM_PHASES / 2 + 1, sizeof(*ranges), GFP_KERNEL); + ranges = kmalloc_array(priv->num_phases / 2 + 1, + sizeof(*ranges), GFP_KERNEL); if (!ranges) return -ENOMEM; /* Try each phase and extract good ranges */ - for (i = 0; i < NUM_PHASES; ) { - clk_set_phase(priv->sample_clk, TUNING_ITERATION_TO_PHASE(i)); + for (i = 0; i < priv->num_phases; ) { + clk_set_phase(priv->sample_clk, + TUNING_ITERATION_TO_PHASE(i, priv->num_phases)); v = !mmc_send_tuning(mmc, opcode, NULL); @@ -179,7 +182,7 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) if (v) { ranges[range_count-1].end = i; i++; - } else if (i == NUM_PHASES - 1) { + } else if (i == priv->num_phases - 1) { /* No extra skipping rules if we're at the end */ i++; } else { @@ -188,11 +191,11 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) * one since testing bad phases is slow. Skip * 20 degrees. */ - i += DIV_ROUND_UP(20 * NUM_PHASES, 360); + i += DIV_ROUND_UP(20 * priv->num_phases, 360); /* Always test the last one */ - if (i >= NUM_PHASES) - i = NUM_PHASES - 1; + if (i >= priv->num_phases) + i = priv->num_phases - 1; } prev_v = v; @@ -210,7 +213,7 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) range_count--; } - if (ranges[0].start == 0 && ranges[0].end == NUM_PHASES - 1) { + if (ranges[0].start == 0 && ranges[0].end == priv->num_phases - 1) { clk_set_phase(priv->sample_clk, priv->default_sample_phase); dev_info(host->dev, "All phases work, using default phase %d.", priv->default_sample_phase); @@ -222,7 +225,7 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) int len = (ranges[i].end - ranges[i].start + 1); if (len < 0) - len += NUM_PHASES; + len += priv->num_phases; if (longest_range_len < len) { longest_range_len = len; @@ -230,25 +233,30 @@ static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode) } dev_dbg(host->dev, "Good phase range %d-%d (%d len)\n", - TUNING_ITERATION_TO_PHASE(ranges[i].start), - TUNING_ITERATION_TO_PHASE(ranges[i].end), + TUNING_ITERATION_TO_PHASE(ranges[i].start, + priv->num_phases), + TUNING_ITERATION_TO_PHASE(ranges[i].end, + priv->num_phases), len ); } dev_dbg(host->dev, "Best phase range %d-%d (%d len)\n", - TUNING_ITERATION_TO_PHASE(ranges[longest_range].start), - TUNING_ITERATION_TO_PHASE(ranges[longest_range].end), + TUNING_ITERATION_TO_PHASE(ranges[longest_range].start, + priv->num_phases), + TUNING_ITERATION_TO_PHASE(ranges[longest_range].end, + priv->num_phases), longest_range_len ); middle_phase = ranges[longest_range].start + longest_range_len / 2; - middle_phase %= NUM_PHASES; + middle_phase %= priv->num_phases; dev_info(host->dev, "Successfully tuned phase to %d\n", - TUNING_ITERATION_TO_PHASE(middle_phase)); + TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases)); clk_set_phase(priv->sample_clk, - TUNING_ITERATION_TO_PHASE(middle_phase)); + TUNING_ITERATION_TO_PHASE(middle_phase, + priv->num_phases)); free: kfree(ranges); @@ -264,6 +272,10 @@ static int dw_mci_rk3288_parse_dt(struct dw_mci *host) if (!priv) return -ENOMEM; + if (of_property_read_u32(np, "rockchip,default-num-phases", + &priv->num_phases)) + priv->num_phases = 360; + if (of_property_read_u32(np, "rockchip,default-sample-phase", &priv->default_sample_phase)) priv->default_sample_phase = 0;
Currently we unconditionally do tuning for each degree, which costs 900ms for each boot and resume. May someone argue that this is a question of accuracy VS time. But I would say it's a trick of how we need to do decision for our boards. If we don't care the time we spend at all, we could definitely do tuning for each degree. But when we need to improve the user experience, for instance, speed up resuming from S3, we should also have the right to do that. This patch add parsing "rockchip,default-num-phases", for folks to specify the number of doing tuning. If not specified, 360 will be used as before. Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> --- drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++-------------- 1 file changed, 30 insertions(+), 18 deletions(-)