Message ID | 538D9631.9090500@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 3 June 2014 15:02, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c > index e1c6433..3e8588f 100644 > --- a/drivers/cpufreq/cpufreq_governor.c > +++ b/drivers/cpufreq/cpufreq_governor.c > @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) > struct od_dbs_tuners *od_tuners = dbs_data->tuners; > struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; > struct cpufreq_policy *policy; > + unsigned int sampling_rate; > unsigned int max_load = 0; > unsigned int ignore_nice; > unsigned int j; > > - if (dbs_data->cdata->governor == GOV_ONDEMAND) > + if (dbs_data->cdata->governor == GOV_ONDEMAND) { > + struct od_cpu_dbs_info_s *od_dbs_info; > + > + /* > + * Sometimes, the ondemand governor uses an additional > + * multiplier to give long delays. So apply this multiplier to > + * the 'sampling_rate', so as to keep the wake-up-from-idle > + * detection logic a bit conservative. > + */ > + sampling_rate = od_tuners->sampling_rate; > + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); Probably do both above right after definition of od_dbs_info or merge above with it. and just keep below after the comment ? > + sampling_rate *= od_dbs_info->rate_mult; -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 06/03/2014 03:09 PM, Viresh Kumar wrote: > On 3 June 2014 15:02, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: >> diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c >> index e1c6433..3e8588f 100644 >> --- a/drivers/cpufreq/cpufreq_governor.c >> +++ b/drivers/cpufreq/cpufreq_governor.c >> @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) >> struct od_dbs_tuners *od_tuners = dbs_data->tuners; >> struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; >> struct cpufreq_policy *policy; >> + unsigned int sampling_rate; >> unsigned int max_load = 0; >> unsigned int ignore_nice; >> unsigned int j; >> >> - if (dbs_data->cdata->governor == GOV_ONDEMAND) >> + if (dbs_data->cdata->governor == GOV_ONDEMAND) { >> + struct od_cpu_dbs_info_s *od_dbs_info; >> + >> + /* >> + * Sometimes, the ondemand governor uses an additional >> + * multiplier to give long delays. So apply this multiplier to >> + * the 'sampling_rate', so as to keep the wake-up-from-idle >> + * detection logic a bit conservative. >> + */ >> + sampling_rate = od_tuners->sampling_rate; >> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); > > Probably do both above right after definition of od_dbs_info or merge > above with it. and just keep below after the comment ? > >> + sampling_rate *= od_dbs_info->rate_mult; > Well, the method I used keeps the organization such that the code following the comment does precisely what the comment says (i.e, get the sampling_rate, fetch the multiplier, and then multiply). So I feel it makes it easier to understand. Regards, Srivatsa S. Bhat -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 3 June 2014 15:34, Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> wrote: > Well, the method I used keeps the organization such that the code following > the comment does precisely what the comment says (i.e, get the sampling_rate, > fetch the multiplier, and then multiply). So I feel it makes it easier to > understand. It looked like the comment is there only for this special statement: >>> + sampling_rate *= od_dbs_info->rate_mult; And so suggested that :) Anyway move this up as it doesn't belong to comment for sure. >> + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/drivers/cpufreq/cpufreq_governor.c b/drivers/cpufreq/cpufreq_governor.c index e1c6433..3e8588f 100644 --- a/drivers/cpufreq/cpufreq_governor.c +++ b/drivers/cpufreq/cpufreq_governor.c @@ -36,14 +36,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) struct od_dbs_tuners *od_tuners = dbs_data->tuners; struct cs_dbs_tuners *cs_tuners = dbs_data->tuners; struct cpufreq_policy *policy; + unsigned int sampling_rate; unsigned int max_load = 0; unsigned int ignore_nice; unsigned int j; - if (dbs_data->cdata->governor == GOV_ONDEMAND) + if (dbs_data->cdata->governor == GOV_ONDEMAND) { + struct od_cpu_dbs_info_s *od_dbs_info; + + /* + * Sometimes, the ondemand governor uses an additional + * multiplier to give long delays. So apply this multiplier to + * the 'sampling_rate', so as to keep the wake-up-from-idle + * detection logic a bit conservative. + */ + sampling_rate = od_tuners->sampling_rate; + od_dbs_info = dbs_data->cdata->get_cpu_dbs_info_s(cpu); + sampling_rate *= od_dbs_info->rate_mult; + ignore_nice = od_tuners->ignore_nice_load; - else + } else { + sampling_rate = cs_tuners->sampling_rate; ignore_nice = cs_tuners->ignore_nice_load; + } policy = cdbs->cur_policy; @@ -96,7 +111,29 @@ void dbs_check_cpu(struct dbs_data *dbs_data, int cpu) if (unlikely(!wall_time || wall_time < idle_time)) continue; - load = 100 * (wall_time - idle_time) / wall_time; + /* + * If the CPU had gone completely idle, and a task just woke up + * on this CPU now, it would be unfair to calculate 'load' the + * usual way for this elapsed time-window, because it will show + * near-zero load, irrespective of how CPU intensive the new + * task is. This is undesirable for latency-sensitive bursty + * workloads. + * + * To avoid this, we reuse the 'load' from the previous + * time-window and give this task a chance to start with a + * reasonably high CPU frequency. + * + * Detecting this situation is easy: the governor's deferrable + * timer would not have fired during CPU-idle periods. Hence + * an unusually large 'wall_time' (as compared to the sampling + * rate) indicates this scenario. + */ + if (unlikely(wall_time > (2 * sampling_rate))) { + load = j_cdbs->prev_load; + } else { + load = 100 * (wall_time - idle_time) / wall_time; + j_cdbs->prev_load = load; + } if (load > max_load) max_load = load; @@ -323,6 +360,10 @@ int cpufreq_governor_dbs(struct cpufreq_policy *policy, j_cdbs->cur_policy = policy; j_cdbs->prev_cpu_idle = get_cpu_idle_time(j, &j_cdbs->prev_cpu_wall, io_busy); + j_cdbs->prev_load = 100 * (j_cdbs->prev_cpu_wall - + j_cdbs->prev_cpu_idle) / + j_cdbs->prev_cpu_wall; + if (ignore_nice) j_cdbs->prev_cpu_nice = kcpustat_cpu(j).cpustat[CPUTIME_NICE]; diff --git a/drivers/cpufreq/cpufreq_governor.h b/drivers/cpufreq/cpufreq_governor.h index bfb9ae1..b56552b 100644 --- a/drivers/cpufreq/cpufreq_governor.h +++ b/drivers/cpufreq/cpufreq_governor.h @@ -134,6 +134,7 @@ struct cpu_dbs_common_info { u64 prev_cpu_idle; u64 prev_cpu_wall; u64 prev_cpu_nice; + unsigned int prev_load; struct cpufreq_policy *cur_policy; struct delayed_work work; /*