diff mbox

[2/2] mmc: dw_mmc-rockchip: parse rockchip,default-num-phases from DT

Message ID 1492592434-81312-3-git-send-email-shawn.lin@rock-chips.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shawn Lin April 19, 2017, 9 a.m. UTC
Currently we unconditionally do tuning for each degree, which
costs 900ms for each boot and resume.

May someone argue that this is a question of accuracy VS time. But I
would say it's a trick of how we need to do decision for our boards.
If we don't care the time we spend at all, we could definitely do tuning
for each degree. But when we need to improve the user experience, for
instance, speed up resuming from S3, we should also have the right to
do that. This patch add parsing "rockchip,default-num-phases", for folks
to specify the number of doing tuning. If not specified, 360 will be used
as before.

Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>

---

 drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++--------------
 1 file changed, 30 insertions(+), 18 deletions(-)

Comments

Doug Anderson April 19, 2017, 8:19 p.m. UTC | #1
Hi,

On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Currently we unconditionally do tuning for each degree, which
> costs 900ms for each boot and resume.
>
> May someone argue that this is a question of accuracy VS time. But I
> would say it's a trick of how we need to do decision for our boards.
> If we don't care the time we spend at all, we could definitely do tuning
> for each degree. But when we need to improve the user experience, for
> instance, speed up resuming from S3, we should also have the right to
> do that. This patch add parsing "rockchip,default-num-phases", for folks
> to specify the number of doing tuning. If not specified, 360 will be used
> as before.
>
> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>
> ---
>
>  drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++--------------
>  1 file changed, 30 insertions(+), 18 deletions(-)

No huge objection here, but I do remember we ended up at the 360
phases due to some of the craziness with dw_mmc delay elements on
rk3288.  IIRC one of the big problems was that the delay elements
changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at
runtime for DDR DVFS.  That imposed an extra need to be very accurate
on that SoC, at least on any board that was designed to support DDR
DVFS.

I also remember there being some weirdness on the Rockchip
implementation where there was a certain set of phases that the MMC
controller was essentially "blind".  This blind spot was in the middle
of an otherwise good range of points.  Unfortunately this blind spot
was somewhat hard to detect properly because it was not very big.
...the variability of the delay elements meant that there could be big
ranges where we weren't getting any good test coverage, but also the
fact that they changed with the LOGIC voltage might mean that we
weren't in the "blind" spot and then suddenly we were.

One other note is that i remember that the vast majority of time spent
tuning was dealing with "bad" phases, not dealing with "good" phases.
If you're looking to speed things up, maybe finding a way to make
"bad" phases fail faster would be wise?  I think one of the reasons
bad phases failed so slowly is because the dw_mmc timeouts are all so
long.

Oh, and I guess one last note is that I have no idea if folks will
like the device bindings here.  Part of me thinks it should be
something more "symbolic" like "rockchip,need-accurate-tuning" or
something like that.  I guess I'd let the DT experts chime in.


So I guess to summarize:
* On rk3288 boards w/ DDR DVFS (or any other similar boards), 360
seems to provide real benefit.
* On other boards, probably you can get by with fewer phases.


-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Lin April 20, 2017, 1:21 a.m. UTC | #2
Hi Doug,

在 2017/4/20 4:19, Doug Anderson 写道:
> Hi,
>
> On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> Currently we unconditionally do tuning for each degree, which
>> costs 900ms for each boot and resume.
>>
>> May someone argue that this is a question of accuracy VS time. But I
>> would say it's a trick of how we need to do decision for our boards.
>> If we don't care the time we spend at all, we could definitely do tuning
>> for each degree. But when we need to improve the user experience, for
>> instance, speed up resuming from S3, we should also have the right to
>> do that. This patch add parsing "rockchip,default-num-phases", for folks
>> to specify the number of doing tuning. If not specified, 360 will be used
>> as before.
>>
>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>
>> ---
>>
>>  drivers/mmc/host/dw_mmc-rockchip.c | 48 ++++++++++++++++++++++++--------------
>>  1 file changed, 30 insertions(+), 18 deletions(-)
>
> No huge objection here, but I do remember we ended up at the 360
> phases due to some of the craziness with dw_mmc delay elements on
> rk3288.  IIRC one of the big problems was that the delay elements
> changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at
> runtime for DDR DVFS.  That imposed an extra need to be very accurate
> on that SoC, at least on any board that was designed to support DDR
> DVFS.
>

Not just with the vdd_logic but also with the process of Soc.
To better guaratee the accuracy, firstly we use delay element to do
tuning and then convert it to be combination of degree + delay element.
But as the dalay elements aren't accuracy themself, so all the math we
do here is trick.

> I also remember there being some weirdness on the Rockchip
> implementation where there was a certain set of phases that the MMC
> controller was essentially "blind".  This blind spot was in the middle
> of an otherwise good range of points.  Unfortunately this blind spot
> was somewhat hard to detect properly because it was not very big.
> ...the variability of the delay elements meant that there could be big
> ranges where we weren't getting any good test coverage, but also the
> fact that they changed with the LOGIC voltage might mean that we
> weren't in the "blind" spot and then suddenly we were.

I undertand all of these as I was suffering from it when bringing up
RK3288.

>
> One other note is that i remember that the vast majority of time spent
> tuning was dealing with "bad" phases, not dealing with "good" phases.
> If you're looking to speed things up, maybe finding a way to make
> "bad" phases fail faster would be wise?  I think one of the reasons
> bad phases failed so slowly is because the dw_mmc timeouts are all so
> long.

Good point. I haven't thought of speeding up the handle of bad phases,
but I will take a look at this.

>
> Oh, and I guess one last note is that I have no idea if folks will
> like the device bindings here.  Part of me thinks it should be
> something more "symbolic" like "rockchip,need-accurate-tuning" or
> something like that.  I guess I'd let the DT experts chime in.
>
>
> So I guess to summarize:
> * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360
> seems to provide real benefit.
> * On other boards, probably you can get by with fewer phases.
>

I would try to say it's a question of "900ms for a single time" VS.
"some of discrete tuning cost for more chance to do retune".

(1)We could try to do a more accurate tuning process and spends 900ms.
Then we have less chance to do retune later.

(2)We do a rough tuning and have more chance to do retune later.

I also would say that this is a game , and we can't say which
one is better. Obvious now the "900ms" alwyas happens in the spot
routine, for instance, booting and resuming from S3.

>
> -Doug
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Doug Anderson April 24, 2017, 4:18 p.m. UTC | #3
Hi,

On Wed, Apr 19, 2017 at 6:21 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
> Hi Doug,
>
> 在 2017/4/20 4:19, Doug Anderson 写道:
>>
>> Hi,
>>
>> On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com>
>> wrote:
>>>
>>> Currently we unconditionally do tuning for each degree, which
>>> costs 900ms for each boot and resume.
>>>
>>> May someone argue that this is a question of accuracy VS time. But I
>>> would say it's a trick of how we need to do decision for our boards.
>>> If we don't care the time we spend at all, we could definitely do tuning
>>> for each degree. But when we need to improve the user experience, for
>>> instance, speed up resuming from S3, we should also have the right to
>>> do that. This patch add parsing "rockchip,default-num-phases", for folks
>>> to specify the number of doing tuning. If not specified, 360 will be used
>>> as before.
>>>
>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>
>>> ---
>>>
>>>  drivers/mmc/host/dw_mmc-rockchip.c | 48
>>> ++++++++++++++++++++++++--------------
>>>  1 file changed, 30 insertions(+), 18 deletions(-)
>>
>>
>> No huge objection here, but I do remember we ended up at the 360
>> phases due to some of the craziness with dw_mmc delay elements on
>> rk3288.  IIRC one of the big problems was that the delay elements
>> changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at
>> runtime for DDR DVFS.  That imposed an extra need to be very accurate
>> on that SoC, at least on any board that was designed to support DDR
>> DVFS.
>>
>
> Not just with the vdd_logic but also with the process of Soc.
> To better guaratee the accuracy, firstly we use delay element to do
> tuning and then convert it to be combination of degree + delay element.
> But as the dalay elements aren't accuracy themself, so all the math we
> do here is trick.

Yup.  I brought up the vdd logic specifically because it's something
that can make the phases change quite dramatically on the same machine
between the time you tuned and the time you used it.


>> I also remember there being some weirdness on the Rockchip
>> implementation where there was a certain set of phases that the MMC
>> controller was essentially "blind".  This blind spot was in the middle
>> of an otherwise good range of points.  Unfortunately this blind spot
>> was somewhat hard to detect properly because it was not very big.
>> ...the variability of the delay elements meant that there could be big
>> ranges where we weren't getting any good test coverage, but also the
>> fact that they changed with the LOGIC voltage might mean that we
>> weren't in the "blind" spot and then suddenly we were.
>
>
> I undertand all of these as I was suffering from it when bringing up
> RK3288.
>
>>
>> One other note is that i remember that the vast majority of time spent
>> tuning was dealing with "bad" phases, not dealing with "good" phases.
>> If you're looking to speed things up, maybe finding a way to make
>> "bad" phases fail faster would be wise?  I think one of the reasons
>> bad phases failed so slowly is because the dw_mmc timeouts are all so
>> long.
>
>
> Good point. I haven't thought of speeding up the handle of bad phases,
> but I will take a look at this.
>
>>
>> Oh, and I guess one last note is that I have no idea if folks will
>> like the device bindings here.  Part of me thinks it should be
>> something more "symbolic" like "rockchip,need-accurate-tuning" or
>> something like that.  I guess I'd let the DT experts chime in.
>>
>>
>> So I guess to summarize:
>> * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360
>> seems to provide real benefit.
>> * On other boards, probably you can get by with fewer phases.
>>
>
> I would try to say it's a question of "900ms for a single time" VS.
> "some of discrete tuning cost for more chance to do retune".
>
> (1)We could try to do a more accurate tuning process and spends 900ms.
> Then we have less chance to do retune later.
>
> (2)We do a rough tuning and have more chance to do retune later.

Ah, interesting point.  I haven't used newer versions of Linux much,
but I seem to remember that they will automatically retune sometimes
if they see errors.  That makes your strategy a bit more valid.


> I also would say that this is a game , and we can't say which
> one is better. Obvious now the "900ms" alwyas happens in the spot
> routine, for instance, booting and resuming from S3.

Is it really 900 ms?  I don't quite remember it being that long, but I
could be remembering incorrectly.

-Doug
--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shawn Lin May 2, 2017, 6:58 a.m. UTC | #4
Hi Doug,

在 2017/4/25 0:18, Doug Anderson 写道:
> Hi,
>
> On Wed, Apr 19, 2017 at 6:21 PM, Shawn Lin <shawn.lin@rock-chips.com> wrote:
>> Hi Doug,
>>
>> 在 2017/4/20 4:19, Doug Anderson 写道:
>>>
>>> Hi,
>>>
>>> On Wed, Apr 19, 2017 at 2:00 AM, Shawn Lin <shawn.lin@rock-chips.com>
>>> wrote:
>>>>
>>>> Currently we unconditionally do tuning for each degree, which
>>>> costs 900ms for each boot and resume.
>>>>
>>>> May someone argue that this is a question of accuracy VS time. But I
>>>> would say it's a trick of how we need to do decision for our boards.
>>>> If we don't care the time we spend at all, we could definitely do tuning
>>>> for each degree. But when we need to improve the user experience, for
>>>> instance, speed up resuming from S3, we should also have the right to
>>>> do that. This patch add parsing "rockchip,default-num-phases", for folks
>>>> to specify the number of doing tuning. If not specified, 360 will be used
>>>> as before.
>>>>
>>>> Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com>
>>>>
>>>> ---
>>>>
>>>>  drivers/mmc/host/dw_mmc-rockchip.c | 48
>>>> ++++++++++++++++++++++++--------------
>>>>  1 file changed, 30 insertions(+), 18 deletions(-)
>>>
>>>
>>> No huge objection here, but I do remember we ended up at the 360
>>> phases due to some of the craziness with dw_mmc delay elements on
>>> rk3288.  IIRC one of the big problems was that the delay elements
>>> changed _a lot_ with the "LOGIC" voltage and we tweaked the voltage at
>>> runtime for DDR DVFS.  That imposed an extra need to be very accurate
>>> on that SoC, at least on any board that was designed to support DDR
>>> DVFS.
>>>
>>
>> Not just with the vdd_logic but also with the process of Soc.
>> To better guaratee the accuracy, firstly we use delay element to do
>> tuning and then convert it to be combination of degree + delay element.
>> But as the dalay elements aren't accuracy themself, so all the math we
>> do here is trick.
>
> Yup.  I brought up the vdd logic specifically because it's something
> that can make the phases change quite dramatically on the same machine
> between the time you tuned and the time you used it.
>
>
>>> I also remember there being some weirdness on the Rockchip
>>> implementation where there was a certain set of phases that the MMC
>>> controller was essentially "blind".  This blind spot was in the middle
>>> of an otherwise good range of points.  Unfortunately this blind spot
>>> was somewhat hard to detect properly because it was not very big.
>>> ...the variability of the delay elements meant that there could be big
>>> ranges where we weren't getting any good test coverage, but also the
>>> fact that they changed with the LOGIC voltage might mean that we
>>> weren't in the "blind" spot and then suddenly we were.
>>
>>
>> I undertand all of these as I was suffering from it when bringing up
>> RK3288.
>>
>>>
>>> One other note is that i remember that the vast majority of time spent
>>> tuning was dealing with "bad" phases, not dealing with "good" phases.
>>> If you're looking to speed things up, maybe finding a way to make
>>> "bad" phases fail faster would be wise?  I think one of the reasons
>>> bad phases failed so slowly is because the dw_mmc timeouts are all so
>>> long.
>>
>>
>> Good point. I haven't thought of speeding up the handle of bad phases,
>> but I will take a look at this.
>>
>>>
>>> Oh, and I guess one last note is that I have no idea if folks will
>>> like the device bindings here.  Part of me thinks it should be
>>> something more "symbolic" like "rockchip,need-accurate-tuning" or
>>> something like that.  I guess I'd let the DT experts chime in.
>>>
>>>
>>> So I guess to summarize:
>>> * On rk3288 boards w/ DDR DVFS (or any other similar boards), 360
>>> seems to provide real benefit.
>>> * On other boards, probably you can get by with fewer phases.
>>>
>>
>> I would try to say it's a question of "900ms for a single time" VS.
>> "some of discrete tuning cost for more chance to do retune".
>>
>> (1)We could try to do a more accurate tuning process and spends 900ms.
>> Then we have less chance to do retune later.
>>
>> (2)We do a rough tuning and have more chance to do retune later.
>
> Ah, interesting point.  I haven't used newer versions of Linux much,
> but I seem to remember that they will automatically retune sometimes
> if they see errors.  That makes your strategy a bit more valid.
>
>
>> I also would say that this is a game , and we can't say which
>> one is better. Obvious now the "900ms" alwyas happens in the spot
>> routine, for instance, booting and resuming from S3.
>
> Is it really 900 ms?  I don't quite remember it being that long, but I
> could be remembering incorrectly.

I saw the worst case was nearly 900ms. But mostly we need 600ms there.

>
> -Doug
>
>
>

--
To unsubscribe from this list: send the line "unsubscribe linux-mmc" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/mmc/host/dw_mmc-rockchip.c b/drivers/mmc/host/dw_mmc-rockchip.c
index 372fb6e..c535526 100644
--- a/drivers/mmc/host/dw_mmc-rockchip.c
+++ b/drivers/mmc/host/dw_mmc-rockchip.c
@@ -25,6 +25,7 @@  struct dw_mci_rockchip_priv_data {
 	struct clk		*drv_clk;
 	struct clk		*sample_clk;
 	int			default_sample_phase;
+	int			num_phases;
 };
 
 static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
@@ -133,8 +134,8 @@  static void dw_mci_rk3288_set_ios(struct dw_mci *host, struct mmc_ios *ios)
 	}
 }
 
-#define NUM_PHASES			360
-#define TUNING_ITERATION_TO_PHASE(i)	(DIV_ROUND_UP((i) * 360, NUM_PHASES))
+#define TUNING_ITERATION_TO_PHASE(i, num_phases) \
+		(DIV_ROUND_UP((i) * 360, num_phases))
 
 static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 {
@@ -159,13 +160,15 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 		return -EIO;
 	}
 
-	ranges = kmalloc_array(NUM_PHASES / 2 + 1, sizeof(*ranges), GFP_KERNEL);
+	ranges = kmalloc_array(priv->num_phases / 2 + 1,
+			       sizeof(*ranges), GFP_KERNEL);
 	if (!ranges)
 		return -ENOMEM;
 
 	/* Try each phase and extract good ranges */
-	for (i = 0; i < NUM_PHASES; ) {
-		clk_set_phase(priv->sample_clk, TUNING_ITERATION_TO_PHASE(i));
+	for (i = 0; i < priv->num_phases; ) {
+		clk_set_phase(priv->sample_clk,
+			      TUNING_ITERATION_TO_PHASE(i, priv->num_phases));
 
 		v = !mmc_send_tuning(mmc, opcode, NULL);
 
@@ -179,7 +182,7 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 		if (v) {
 			ranges[range_count-1].end = i;
 			i++;
-		} else if (i == NUM_PHASES - 1) {
+		} else if (i == priv->num_phases - 1) {
 			/* No extra skipping rules if we're at the end */
 			i++;
 		} else {
@@ -188,11 +191,11 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 			 * one since testing bad phases is slow.  Skip
 			 * 20 degrees.
 			 */
-			i += DIV_ROUND_UP(20 * NUM_PHASES, 360);
+			i += DIV_ROUND_UP(20 * priv->num_phases, 360);
 
 			/* Always test the last one */
-			if (i >= NUM_PHASES)
-				i = NUM_PHASES - 1;
+			if (i >= priv->num_phases)
+				i = priv->num_phases - 1;
 		}
 
 		prev_v = v;
@@ -210,7 +213,7 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 		range_count--;
 	}
 
-	if (ranges[0].start == 0 && ranges[0].end == NUM_PHASES - 1) {
+	if (ranges[0].start == 0 && ranges[0].end == priv->num_phases - 1) {
 		clk_set_phase(priv->sample_clk, priv->default_sample_phase);
 		dev_info(host->dev, "All phases work, using default phase %d.",
 			 priv->default_sample_phase);
@@ -222,7 +225,7 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 		int len = (ranges[i].end - ranges[i].start + 1);
 
 		if (len < 0)
-			len += NUM_PHASES;
+			len += priv->num_phases;
 
 		if (longest_range_len < len) {
 			longest_range_len = len;
@@ -230,25 +233,30 @@  static int dw_mci_rk3288_execute_tuning(struct dw_mci_slot *slot, u32 opcode)
 		}
 
 		dev_dbg(host->dev, "Good phase range %d-%d (%d len)\n",
-			TUNING_ITERATION_TO_PHASE(ranges[i].start),
-			TUNING_ITERATION_TO_PHASE(ranges[i].end),
+			TUNING_ITERATION_TO_PHASE(ranges[i].start,
+						  priv->num_phases),
+			TUNING_ITERATION_TO_PHASE(ranges[i].end,
+						  priv->num_phases),
 			len
 		);
 	}
 
 	dev_dbg(host->dev, "Best phase range %d-%d (%d len)\n",
-		TUNING_ITERATION_TO_PHASE(ranges[longest_range].start),
-		TUNING_ITERATION_TO_PHASE(ranges[longest_range].end),
+		TUNING_ITERATION_TO_PHASE(ranges[longest_range].start,
+					  priv->num_phases),
+		TUNING_ITERATION_TO_PHASE(ranges[longest_range].end,
+					  priv->num_phases),
 		longest_range_len
 	);
 
 	middle_phase = ranges[longest_range].start + longest_range_len / 2;
-	middle_phase %= NUM_PHASES;
+	middle_phase %= priv->num_phases;
 	dev_info(host->dev, "Successfully tuned phase to %d\n",
-		 TUNING_ITERATION_TO_PHASE(middle_phase));
+		 TUNING_ITERATION_TO_PHASE(middle_phase, priv->num_phases));
 
 	clk_set_phase(priv->sample_clk,
-		      TUNING_ITERATION_TO_PHASE(middle_phase));
+		      TUNING_ITERATION_TO_PHASE(middle_phase,
+						priv->num_phases));
 
 free:
 	kfree(ranges);
@@ -264,6 +272,10 @@  static int dw_mci_rk3288_parse_dt(struct dw_mci *host)
 	if (!priv)
 		return -ENOMEM;
 
+	if (of_property_read_u32(np, "rockchip,default-num-phases",
+					&priv->num_phases))
+		priv->num_phases = 360;
+
 	if (of_property_read_u32(np, "rockchip,default-sample-phase",
 					&priv->default_sample_phase))
 		priv->default_sample_phase = 0;