Message ID | 20191011134500.235736-5-douglas.raillard@arm.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
Series | sched/cpufreq: Make schedutil energy aware | expand |
On Fri, Oct 11, 2019 at 02:44:58PM +0100, Douglas RAILLARD wrote: > Use the utilization signals dynamic to detect when the utilization of a > set of tasks starts increasing because of a change in tasks' behavior. > This allows detecting when spending extra power for faster frequency > ramp up response would be beneficial to the reactivity of the system. > > This ramp boost is computed as the difference > util_avg-util_est_enqueued. This number somehow represents a lower bound That reads funny, maybe 'as the difference between util_avg and util_est_enqueued' ? > of how much extra utilization this tasks is actually using, compared to > our best current stable knowledge of it (which is util_est_enqueued). > > When the set of runnable tasks changes, the boost is disabled as the > impact of blocked utilization on util_avg will make the delta with > util_est_enqueued not very informative. > @@ -561,6 +604,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > } > } > > + > return get_next_freq(sg_policy, util, max); > } Surely we can do without this extra whitespace? :-)
Hi Peter, On 10/14/19 3:33 PM, Peter Zijlstra wrote: > On Fri, Oct 11, 2019 at 02:44:58PM +0100, Douglas RAILLARD wrote: >> Use the utilization signals dynamic to detect when the utilization of a >> set of tasks starts increasing because of a change in tasks' behavior. >> This allows detecting when spending extra power for faster frequency >> ramp up response would be beneficial to the reactivity of the system. >> >> This ramp boost is computed as the difference >> util_avg-util_est_enqueued. This number somehow represents a lower bound > > That reads funny, maybe 'as the difference between util_avg and > util_est_enqueued' ? Indeed, it was not clear that it was a formula. Talking about formulas, I remember laying down the relations between the various flavors of util signals in the v2 thread. This could be turned rather easily into a doc page for PELT, along with a signal-processing-friendly accurate description of how the PELT signals are built. Would such a patch be of any interest the kernel tree ? >> of how much extra utilization this tasks is actually using, compared to >> our best current stable knowledge of it (which is util_est_enqueued). >> >> When the set of runnable tasks changes, the boost is disabled as the >> impact of blocked utilization on util_avg will make the delta with >> util_est_enqueued not very informative. > >> @@ -561,6 +604,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) >> } >> } >> >> + >> return get_next_freq(sg_policy, util, max); >> } > > Surely we can do without this extra whitespace? :-) > woops ... Cheers, Douglas
On 11/10/2019 15:44, Douglas RAILLARD wrote: [...] > @@ -181,6 +185,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time, > } > } > > +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu) > +{ > + return READ_ONCE(sg_cpu->ramp_boost); > +} > + > +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu) > +{ > + struct rq *rq = cpu_rq(sg_cpu->cpu); > + unsigned long util_est_enqueued; > + unsigned long util_avg; > + unsigned long boost = 0; > + > + util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued); > + util_avg = READ_ONCE(rq->cfs.avg.util_avg); > + > + /* > + * Boost when util_avg becomes higher than the previous stable > + * knowledge of the enqueued tasks' set util, which is CPU's > + * util_est_enqueued. > + * > + * We try to spot changes in the workload itself, so we want to > + * avoid the noise of tasks being enqueued/dequeued. To do that, > + * we only trigger boosting when the "amount of work' enqueued s/"amount of work'/"amount of work" or 'amount of work' [...] > @@ -552,6 +593,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) > unsigned long j_util, j_max; > > j_util = sugov_get_util(j_sg_cpu); > + if (j_sg_cpu == sg_cpu) > + sugov_cpu_ramp_boost_update(sg_cpu); Can you not call this already in sugov_update_shared(), like in the sugov_update_single() case? diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index e35c20b42780..4c53f63a537d 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -595,8 +595,6 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) unsigned long j_util, j_max; j_util = sugov_get_util(j_sg_cpu); - if (j_sg_cpu == sg_cpu) - sugov_cpu_ramp_boost_update(sg_cpu); j_max = j_sg_cpu->max; j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max); @@ -625,6 +623,7 @@ sugov_update_shared(struct update_util_data *hook, u64 time, unsigned int flags) ignore_dl_rate_limit(sg_cpu, sg_policy); if (sugov_should_update_freq(sg_policy, time)) { + sugov_cpu_ramp_boost_update(sg_cpu); next_f = sugov_next_freq_shared(sg_cpu, time); [...]
On 10/17/19 9:57 AM, Dietmar Eggemann wrote: > On 11/10/2019 15:44, Douglas RAILLARD wrote: > > [...] > >> @@ -181,6 +185,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time, >> } >> } >> >> +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu) >> +{ >> + return READ_ONCE(sg_cpu->ramp_boost); >> +} >> + >> +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu) >> +{ >> + struct rq *rq = cpu_rq(sg_cpu->cpu); >> + unsigned long util_est_enqueued; >> + unsigned long util_avg; >> + unsigned long boost = 0; >> + >> + util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued); >> + util_avg = READ_ONCE(rq->cfs.avg.util_avg); >> + >> + /* >> + * Boost when util_avg becomes higher than the previous stable >> + * knowledge of the enqueued tasks' set util, which is CPU's >> + * util_est_enqueued. >> + * >> + * We try to spot changes in the workload itself, so we want to >> + * avoid the noise of tasks being enqueued/dequeued. To do that, >> + * we only trigger boosting when the "amount of work' enqueued > > s/"amount of work'/"amount of work" or 'amount of work' > > [...] > >> @@ -552,6 +593,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) >> unsigned long j_util, j_max; >> >> j_util = sugov_get_util(j_sg_cpu); >> + if (j_sg_cpu == sg_cpu) >> + sugov_cpu_ramp_boost_update(sg_cpu); > > Can you not call this already in sugov_update_shared(), like in the > sugov_update_single() case? The next commit in the series needs to aggregate the ramp_boost of all CPUs in the policy, so this call will end up here anyway, unless we want to set the value at previous level and query it back again in the loop. I don't mind either way, but since no option seem faster than the other, I went for clustering the ramp boost code rather than spreading it at all levels. > diff --git a/kernel/sched/cpufreq_schedutil.c > b/kernel/sched/cpufreq_schedutil.c > index e35c20b42780..4c53f63a537d 100644 > --- a/kernel/sched/cpufreq_schedutil.c > +++ b/kernel/sched/cpufreq_schedutil.c > @@ -595,8 +595,6 @@ static unsigned int sugov_next_freq_shared(struct > sugov_cpu *sg_cpu, u64 time) > unsigned long j_util, j_max; > > j_util = sugov_get_util(j_sg_cpu); > - if (j_sg_cpu == sg_cpu) > - sugov_cpu_ramp_boost_update(sg_cpu); > j_max = j_sg_cpu->max; > j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max); > > @@ -625,6 +623,7 @@ sugov_update_shared(struct update_util_data *hook, > u64 time, unsigned int flags) > ignore_dl_rate_limit(sg_cpu, sg_policy); > > if (sugov_should_update_freq(sg_policy, time)) { > + sugov_cpu_ramp_boost_update(sg_cpu); > next_f = sugov_next_freq_shared(sg_cpu, time); > > [...] >
diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index aab8c0498dd1..c118f85d1f3d 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -61,6 +61,10 @@ struct sugov_cpu { unsigned long bw_dl; unsigned long max; + unsigned long ramp_boost; + unsigned long util_est_enqueued; + unsigned long util_avg; + /* The field below is for single-CPU policies only: */ #ifdef CONFIG_NO_HZ_COMMON unsigned long saved_idle_calls; @@ -181,6 +185,42 @@ static void sugov_deferred_update(struct sugov_policy *sg_policy, u64 time, } } +static unsigned long sugov_cpu_ramp_boost(struct sugov_cpu *sg_cpu) +{ + return READ_ONCE(sg_cpu->ramp_boost); +} + +static unsigned long sugov_cpu_ramp_boost_update(struct sugov_cpu *sg_cpu) +{ + struct rq *rq = cpu_rq(sg_cpu->cpu); + unsigned long util_est_enqueued; + unsigned long util_avg; + unsigned long boost = 0; + + util_est_enqueued = READ_ONCE(rq->cfs.avg.util_est.enqueued); + util_avg = READ_ONCE(rq->cfs.avg.util_avg); + + /* + * Boost when util_avg becomes higher than the previous stable + * knowledge of the enqueued tasks' set util, which is CPU's + * util_est_enqueued. + * + * We try to spot changes in the workload itself, so we want to + * avoid the noise of tasks being enqueued/dequeued. To do that, + * we only trigger boosting when the "amount of work' enqueued + * is stable. + */ + if (util_est_enqueued == sg_cpu->util_est_enqueued && + util_avg >= sg_cpu->util_avg && + util_avg > util_est_enqueued) + boost = util_avg - util_est_enqueued; + + sg_cpu->util_est_enqueued = util_est_enqueued; + sg_cpu->util_avg = util_avg; + WRITE_ONCE(sg_cpu->ramp_boost, boost); + return boost; +} + /** * get_next_freq - Compute a new frequency for a given cpufreq policy. * @sg_policy: schedutil policy object to compute the new frequency for. @@ -512,6 +552,7 @@ static void sugov_update_single(struct update_util_data *hook, u64 time, busy = !sg_policy->need_freq_update && sugov_cpu_is_busy(sg_cpu); util = sugov_get_util(sg_cpu); + sugov_cpu_ramp_boost_update(sg_cpu); max = sg_cpu->max; util = sugov_iowait_apply(sg_cpu, time, util, max); next_f = get_next_freq(sg_policy, util, max); @@ -552,6 +593,8 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) unsigned long j_util, j_max; j_util = sugov_get_util(j_sg_cpu); + if (j_sg_cpu == sg_cpu) + sugov_cpu_ramp_boost_update(sg_cpu); j_max = j_sg_cpu->max; j_util = sugov_iowait_apply(j_sg_cpu, time, j_util, j_max); @@ -561,6 +604,7 @@ static unsigned int sugov_next_freq_shared(struct sugov_cpu *sg_cpu, u64 time) } } + return get_next_freq(sg_policy, util, max); }
Use the utilization signals dynamic to detect when the utilization of a set of tasks starts increasing because of a change in tasks' behavior. This allows detecting when spending extra power for faster frequency ramp up response would be beneficial to the reactivity of the system. This ramp boost is computed as the difference util_avg-util_est_enqueued. This number somehow represents a lower bound of how much extra utilization this tasks is actually using, compared to our best current stable knowledge of it (which is util_est_enqueued). When the set of runnable tasks changes, the boost is disabled as the impact of blocked utilization on util_avg will make the delta with util_est_enqueued not very informative. Signed-off-by: Douglas RAILLARD <douglas.raillard@arm.com> --- kernel/sched/cpufreq_schedutil.c | 44 ++++++++++++++++++++++++++++++++ 1 file changed, 44 insertions(+)