Message ID | 1459754617-8872-1-git-send-email-philippe.longepe@linux.intel.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On Monday, April 04, 2016 09:23:36 AM Philippe Longepe wrote: > The result returned by pid_calc() is subtracted from current_pstate > (which is the pstate requested during the last period) in order to > obtain the target pstate for the current iteration. > However, current_pstate may not reflect the real current P-state of > the CPU. In particular, that P-state may be higher because of the > frequency sharing per module. > > The theory is: > > - The load is the percentage of time spent in C0 and is related to > the average frequency during the same period (We'll not have the > same load at 1GHz or at 2GHz for the same task running). > > - The current frequency can be completely different than the > average frequency (because of frequency sharing or throttling). > > => The frequency shift computed by the pid_calc is based on the > load, so it must be applied to the frequency with which the load > was measured. > > Using the average pstate instead of current pstate solve some > migration issues (e.g when a task migrates from one core to another > in the same package/module and all of the cores in there except for > that particular one are basically idle). > > Performance and power comparison with this patch on Android: > > IPLoad+Avg-Pstate vs IP Load: > > Benchmark ?Perf ?Power > FishTank 10.45% 3.1% > SmartBench-Gaming -0.1% -10.4% > SmartBench-Productivity -0.8% -10.4% > CandyCrush n/a -17.4% > AngryBirds n/a -5.9% > videoPlayback n/a -13.9% > audioPlayback n/a -4.9% > IcyRocks-20-50 0.0% -38.4% > iozone RR -0.16% -1.3% > iozone RW 0.74% -1.3% > > Comparison with the perf algorithm: > (this patch in cpu_load vs Core algorithm) > > Benchmark ?Perf ?Power > SmartBench-Gaming -0.58% -22.8% > SmartBench-Productivity 0.82% > CandyCrush n/a -20.8% > AngryBirds n/a -37.0% > videoPlayback n/a -53.4% > audioPlayback n/a -2.1% > iozone RR -0.55% -13.29% > iozone RW 2.22% > > => No regression > 1% observed and a huge power improvement! > > Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> Srinivas, what about this one? Yes? No? Maybe? > --- > drivers/cpufreq/intel_pstate.c | 8 +++++++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 4b64452..b998e1d 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -919,6 +919,12 @@ static inline int32_t get_avg_frequency(struct cpudata *cpu) > cpu->pstate.scaling, cpu->sample.mperf); > } > > +static inline int32_t get_avg_pstate(struct cpudata *cpu) > +{ > + return div64_u64(cpu->pstate.max_pstate_physical * cpu->sample.aperf, > + cpu->sample.mperf); > +} > + > static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu) > { > struct sample *sample = &cpu->sample; > @@ -951,7 +957,7 @@ static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu) > cpu_load = div64_u64(int_tofp(100) * mperf, sample->tsc); > cpu->sample.busy_scaled = cpu_load; > > - return cpu->pstate.current_pstate - pid_calc(&cpu->pid, cpu_load); > + return get_avg_pstate(cpu) - pid_calc(&cpu->pid, cpu_load); > } > > static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu) > -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, 2016-04-22 at 02:48 +0200, Rafael J. Wysocki wrote: > On Monday, April 04, 2016 09:23:36 AM Philippe Longepe wrote: > > > > The result returned by pid_calc() is subtracted from current_pstate > > (which is the pstate requested during the last period) in order to > > obtain the target pstate for the current iteration. > > However, current_pstate may not reflect the real current P-state of > > the CPU. In particular, that P-state may be higher because of the > > frequency sharing per module. > > > > The theory is: > > > > - The load is the percentage of time spent in C0 and is related to > > the average frequency during the same period (We'll not have the > > same load at 1GHz or at 2GHz for the same task running). > > > > - The current frequency can be completely different than the > > average frequency (because of frequency sharing or throttling). > > > > => The frequency shift computed by the pid_calc is based on the > > load, so it must be applied to the frequency with which the load > > was measured. > > > > Using the average pstate instead of current pstate solve some > > migration issues (e.g when a task migrates from one core to another > > in the same package/module and all of the cores in there except for > > that particular one are basically idle). > > > > Performance and power comparison with this patch on Android: > > > > IPLoad+Avg-Pstate vs IP Load: > > > > Benchmark ?Perf ?Power > > FishTank 10.45% 3.1% > > SmartBench-Gaming -0.1% -10.4% > > SmartBench-Productivity -0.8% -10.4% > > CandyCrush n/a -17.4% > > AngryBirds n/a -5.9% > > videoPlayback n/a -13.9% > > audioPlayback n/a -4.9% > > IcyRocks-20-50 0.0% -38.4% > > iozone RR -0.16% -1.3% > > iozone RW 0.74% -1.3% > > > > Comparison with the perf algorithm: > > (this patch in cpu_load vs Core algorithm) > > > > Benchmark ?Perf ?Power > > SmartBench-Gaming -0.58% -22.8% > > SmartBench-Productivity 0.82% > > CandyCrush n/a -20.8% > > AngryBirds n/a -37.0% > > videoPlayback n/a -53.4% > > audioPlayback n/a -2.1% > > iozone RR -0.55% -13.29% > > iozone RW 2.22% > > > > => No regression > 1% observed and a huge power improvement! > > > > Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> > Srinivas, what about this one? Yes? No? Maybe? > I am fine with this change. I would like to edit the commit and remove the bottom table compare with core and statement with exclamation as this not relevant here for the affected platforms. Do you want me to send update? Thanks, Srinivas > > > > --- > > drivers/cpufreq/intel_pstate.c | 8 +++++++- > > 1 file changed, 7 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/cpufreq/intel_pstate.c > > b/drivers/cpufreq/intel_pstate.c > > index 4b64452..b998e1d 100644 > > --- a/drivers/cpufreq/intel_pstate.c > > +++ b/drivers/cpufreq/intel_pstate.c > > @@ -919,6 +919,12 @@ static inline int32_t get_avg_frequency(struct > > cpudata *cpu) > > cpu->pstate.scaling, cpu->sample.mperf); > > } > > > > +static inline int32_t get_avg_pstate(struct cpudata *cpu) > > +{ > > + return div64_u64(cpu->pstate.max_pstate_physical * cpu- > > >sample.aperf, > > + cpu->sample.mperf); > > +} > > + > > static inline int32_t get_target_pstate_use_cpu_load(struct > > cpudata *cpu) > > { > > struct sample *sample = &cpu->sample; > > @@ -951,7 +957,7 @@ static inline int32_t > > get_target_pstate_use_cpu_load(struct cpudata *cpu) > > cpu_load = div64_u64(int_tofp(100) * mperf, sample->tsc); > > cpu->sample.busy_scaled = cpu_load; > > > > - return cpu->pstate.current_pstate - pid_calc(&cpu->pid, > > cpu_load); > > + return get_avg_pstate(cpu) - pid_calc(&cpu->pid, > > cpu_load); > > } > > > > static inline int32_t get_target_pstate_use_performance(struct > > cpudata *cpu) > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Apr 22, 2016 at 2:59 AM, Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> wrote: > On Fri, 2016-04-22 at 02:48 +0200, Rafael J. Wysocki wrote: >> On Monday, April 04, 2016 09:23:36 AM Philippe Longepe wrote: >> > >> > The result returned by pid_calc() is subtracted from current_pstate >> > (which is the pstate requested during the last period) in order to >> > obtain the target pstate for the current iteration. >> > However, current_pstate may not reflect the real current P-state of >> > the CPU. In particular, that P-state may be higher because of the >> > frequency sharing per module. >> > >> > The theory is: >> > >> > - The load is the percentage of time spent in C0 and is related to >> > the average frequency during the same period (We'll not have the >> > same load at 1GHz or at 2GHz for the same task running). >> > >> > - The current frequency can be completely different than the >> > average frequency (because of frequency sharing or throttling). >> > >> > => The frequency shift computed by the pid_calc is based on the >> > load, so it must be applied to the frequency with which the load >> > was measured. >> > >> > Using the average pstate instead of current pstate solve some >> > migration issues (e.g when a task migrates from one core to another >> > in the same package/module and all of the cores in there except for >> > that particular one are basically idle). >> > >> > Performance and power comparison with this patch on Android: >> > >> > IPLoad+Avg-Pstate vs IP Load: >> > >> > Benchmark ?Perf ?Power >> > FishTank 10.45% 3.1% >> > SmartBench-Gaming -0.1% -10.4% >> > SmartBench-Productivity -0.8% -10.4% >> > CandyCrush n/a -17.4% >> > AngryBirds n/a -5.9% >> > videoPlayback n/a -13.9% >> > audioPlayback n/a -4.9% >> > IcyRocks-20-50 0.0% -38.4% >> > iozone RR -0.16% -1.3% >> > iozone RW 0.74% -1.3% >> > >> > Comparison with the perf algorithm: >> > (this patch in cpu_load vs Core algorithm) >> > >> > Benchmark ?Perf ?Power >> > SmartBench-Gaming -0.58% -22.8% >> > SmartBench-Productivity 0.82% >> > CandyCrush n/a -20.8% >> > AngryBirds n/a -37.0% >> > videoPlayback n/a -53.4% >> > audioPlayback n/a -2.1% >> > iozone RR -0.55% -13.29% >> > iozone RW 2.22% >> > >> > => No regression > 1% observed and a huge power improvement! >> > >> > Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> > > >> Srinivas, what about this one? Yes? No? Maybe? >> > I am fine with this change. > I would like to edit the commit and remove the bottom table compare > with core and statement with exclamation as this not relevant here for > the affected platforms. > > Do you want me to send update? Yes, please. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Srinivas, Recall a couple of months ago, on the "Increase hold-off time before busyness is scaled" thread, Stephane suggested we try applying this method on the get_target_pstate_use_performance branch of the intel_pstate driver, as opposed to just on the get_target_pstate_use_cpu_load branch. It didn't make any difference with respect to the hold-off time issue. However, I did spend considerable time testing it in other scenarios. It does somewhat temper the occasional tendency to suddenly have a ridiculously high scaled busy number with virtually no load (the same issue from the "[intel-pstate driver regression] processor frequency very high even if in idle" thread, that continued in https://bugzilla.kernel.org/show_bug.cgi?id=115771 ) I didn't find any significant regression, and did observe some small energy savings in some scenarios (admittedly, small enough to have possibly been simply test to test variations, and I didn't do enough tests to extract a definite trend). I am suggesting to consider extending the patch to get_target_pstate_use_performance also. ... Doug -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Hi Doug, On Fri, 2016-04-22 at 08:18 -0700, Doug Smythies wrote: > Srinivas, > > Recall a couple of months ago, on the "Increase hold-off time before > busyness is scaled" > thread, Stephane suggested we try applying this method on the > get_target_pstate_use_performance > branch of the intel_pstate driver, as opposed to just on the > get_target_pstate_use_cpu_load branch. > > It didn't make any difference with respect to the hold-off time > issue. > However, I did spend considerable time testing it in other scenarios. > It does somewhat temper the occasional tendency to suddenly have a > ridiculously high > scaled busy number with virtually no load (the same issue from the > "[intel-pstate driver regression] processor frequency very high even > if in idle" thread, > that continued in https://bugzilla.kernel.org/show_bug.cgi?id=115771 > ) > I didn't find any significant regression, and did observe some small > energy savings > in some scenarios (admittedly, small enough to have possibly been > simply test to test > variations, and I didn't do enough tests to extract a definite > trend). > > I am suggesting to consider extending the patch to > get_target_pstate_use_performance also. > Many core platforms have per core P states. So here the assumption that we get a boost of P State on one core because some activity on other core will not hold true. We are experimenting with algorithm to improve core performance (similar approach as yours + IO boost), hopefully we can publish soon. Thanks, Srinivas > ... Doug > > > -- > To unsubscribe from this list: send the line "unsubscribe linux-pm" > in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 4b64452..b998e1d 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -919,6 +919,12 @@ static inline int32_t get_avg_frequency(struct cpudata *cpu) cpu->pstate.scaling, cpu->sample.mperf); } +static inline int32_t get_avg_pstate(struct cpudata *cpu) +{ + return div64_u64(cpu->pstate.max_pstate_physical * cpu->sample.aperf, + cpu->sample.mperf); +} + static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu) { struct sample *sample = &cpu->sample; @@ -951,7 +957,7 @@ static inline int32_t get_target_pstate_use_cpu_load(struct cpudata *cpu) cpu_load = div64_u64(int_tofp(100) * mperf, sample->tsc); cpu->sample.busy_scaled = cpu_load; - return cpu->pstate.current_pstate - pid_calc(&cpu->pid, cpu_load); + return get_avg_pstate(cpu) - pid_calc(&cpu->pid, cpu_load); } static inline int32_t get_target_pstate_use_performance(struct cpudata *cpu)
The result returned by pid_calc() is subtracted from current_pstate (which is the pstate requested during the last period) in order to obtain the target pstate for the current iteration. However, current_pstate may not reflect the real current P-state of the CPU. In particular, that P-state may be higher because of the frequency sharing per module. The theory is: - The load is the percentage of time spent in C0 and is related to the average frequency during the same period (We'll not have the same load at 1GHz or at 2GHz for the same task running). - The current frequency can be completely different than the average frequency (because of frequency sharing or throttling). => The frequency shift computed by the pid_calc is based on the load, so it must be applied to the frequency with which the load was measured. Using the average pstate instead of current pstate solve some migration issues (e.g when a task migrates from one core to another in the same package/module and all of the cores in there except for that particular one are basically idle). Performance and power comparison with this patch on Android: IPLoad+Avg-Pstate vs IP Load: Benchmark ?Perf ?Power FishTank 10.45% 3.1% SmartBench-Gaming -0.1% -10.4% SmartBench-Productivity -0.8% -10.4% CandyCrush n/a -17.4% AngryBirds n/a -5.9% videoPlayback n/a -13.9% audioPlayback n/a -4.9% IcyRocks-20-50 0.0% -38.4% iozone RR -0.16% -1.3% iozone RW 0.74% -1.3% Comparison with the perf algorithm: (this patch in cpu_load vs Core algorithm) Benchmark ?Perf ?Power SmartBench-Gaming -0.58% -22.8% SmartBench-Productivity 0.82% CandyCrush n/a -20.8% AngryBirds n/a -37.0% videoPlayback n/a -53.4% audioPlayback n/a -2.1% iozone RR -0.55% -13.29% iozone RW 2.22% => No regression > 1% observed and a huge power improvement! Signed-off-by: Philippe Longepe <philippe.longepe@linux.intel.com> --- drivers/cpufreq/intel_pstate.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)