Message ID | 1501812194-55363-1-git-send-email-srinivas.pandruvada@linux.intel.com (mailing list archive) |
---|---|
State | Mainlined |
Delegated to: | Rafael Wysocki |
Headers | show |
On Friday, August 4, 2017 4:03:14 AM CEST Srinivas Pandruvada wrote: > In the current implementation the latency from SCHED_CPUFREQ_IOWAIT is > set to actual P-state adjustment can be upto 10ms. This can be improved > by reacting to SCHED_CPUFREQ_IOWAIT by jumping to max P-state immediately > . With this change the IO performance improves significantly. > > With a simple "grep -r . linux" (Here linux is kernel source folder) with > dropped caches every time on a platform with per core P-states on a > Broadwell Xeon workstation, the user and system time improves as much as > 30% to 40%. > > The same performance difference was not observed on clients, which don't > have per core P-state support. > > Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> > --- > v2: > As suggested by Rafael also updating cpu->last_update time > > drivers/cpufreq/intel_pstate.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c > index 90e8f2b..1cb318b 100644 > --- a/drivers/cpufreq/intel_pstate.c > +++ b/drivers/cpufreq/intel_pstate.c > @@ -1530,6 +1530,15 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, > > if (flags & SCHED_CPUFREQ_IOWAIT) { > cpu->iowait_boost = int_tofp(1); > + cpu->last_update = time; > + /* > + * The last time the busy was 100% so P-state was max anyway > + * so avoid overhead of computation. > + */ > + if (fp_toint(cpu->sample.busy_scaled) == 100) > + return; > + > + goto set_pstate; > } else if (cpu->iowait_boost) { > /* Clear iowait_boost if the CPU may have been idle. */ > delta_ns = time - cpu->last_update; > @@ -1541,6 +1550,7 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, > if ((s64)delta_ns < INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL) > return; > > +set_pstate: > if (intel_pstate_sample(cpu, time)) { > int target_pstate; > > Applied, thanks!
diff --git a/drivers/cpufreq/intel_pstate.c b/drivers/cpufreq/intel_pstate.c index 90e8f2b..1cb318b 100644 --- a/drivers/cpufreq/intel_pstate.c +++ b/drivers/cpufreq/intel_pstate.c @@ -1530,6 +1530,15 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, if (flags & SCHED_CPUFREQ_IOWAIT) { cpu->iowait_boost = int_tofp(1); + cpu->last_update = time; + /* + * The last time the busy was 100% so P-state was max anyway + * so avoid overhead of computation. + */ + if (fp_toint(cpu->sample.busy_scaled) == 100) + return; + + goto set_pstate; } else if (cpu->iowait_boost) { /* Clear iowait_boost if the CPU may have been idle. */ delta_ns = time - cpu->last_update; @@ -1541,6 +1550,7 @@ static void intel_pstate_update_util(struct update_util_data *data, u64 time, if ((s64)delta_ns < INTEL_PSTATE_DEFAULT_SAMPLING_INTERVAL) return; +set_pstate: if (intel_pstate_sample(cpu, time)) { int target_pstate;
In the current implementation the latency from SCHED_CPUFREQ_IOWAIT is set to actual P-state adjustment can be upto 10ms. This can be improved by reacting to SCHED_CPUFREQ_IOWAIT by jumping to max P-state immediately . With this change the IO performance improves significantly. With a simple "grep -r . linux" (Here linux is kernel source folder) with dropped caches every time on a platform with per core P-states on a Broadwell Xeon workstation, the user and system time improves as much as 30% to 40%. The same performance difference was not observed on clients, which don't have per core P-state support. Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> --- v2: As suggested by Rafael also updating cpu->last_update time drivers/cpufreq/intel_pstate.c | 10 ++++++++++ 1 file changed, 10 insertions(+)