Message ID | 57E5A37B.8010802@nvidia.com (mailing list archive) |
---|---|
State | RFC, archived |
Headers | show |
On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote: > When triaging a performance degradation of ~5% on some use cases between > our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4 > was way more aggressive when requesting for deeper idle states. It would > often get it wrong though resulting in perf loss. > > The menu governor tries to bias picking shallower idle states based on the > historical load on the CPU. The busier the CPU, the shallower the idle > state. > > However, after commit "3289bdb sched: Move the loadavg code to a more The normal quoting style is: 3289bdb42988 ("sched: Move the loadavg code to a more obvious location") Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have already happened. Also, that was more than a year ago, nobody noticed? > obvious location", the load metric it looks at is rq->load.weight which is > the instantaneous se->load.weight sum for top level entities on the rq > which on idle entry is always 0 (for the common case at least) because > there is nothing on the cfs rq. > > The previous metric the menu governor used was rq->cpu_load[0] which is a > snap shot of the weighted_cpuload at the previous load update point so it > isn't always 0 on idle entry. Right, basically a 'random' number :) > Unfortunately, it isn't straightforward to switch the metric being used to > rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a > lot more than rq->cpu_load[0] (include blocked task contrib.). That would > potentially require redoing the magic constants in the menu governor's > performance_multiplier...so for now, use rq->cpu_load[0] instead to > preserve old behaviour. Works for me I suppose, but I would indeed encourage people to investigate better options. -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/23/2016 03:06 PM, Peter Zijlstra wrote:> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote: >> When triaging a performance degradation of ~5% on some use cases between >> our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4 >> was way more aggressive when requesting for deeper idle states. It would >> often get it wrong though resulting in perf loss. >> >> The menu governor tries to bias picking shallower idle states based on the >> historical load on the CPU. The busier the CPU, the shallower the idle >> state. >> >> However, after commit "3289bdb sched: Move the loadavg code to a more > > The normal quoting style is: > > 3289bdb42988 ("sched: Move the loadavg code to a more obvious location") > > Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have > already happened. > > Also, that was more than a year ago, nobody noticed? Ah, sorry. Will do next time. > >> obvious location", the load metric it looks at is rq->load.weight which is >> the instantaneous se->load.weight sum for top level entities on the rq >> which on idle entry is always 0 (for the common case at least) because >> there is nothing on the cfs rq. >> >> The previous metric the menu governor used was rq->cpu_load[0] which is a >> snap shot of the weighted_cpuload at the previous load update point so it >> isn't always 0 on idle entry. > > Right, basically a 'random' number :) Indeed. I never really understood how things worked with the cpu_load stuff given how random it seemed. > >> Unfortunately, it isn't straightforward to switch the metric being used to >> rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a >> lot more than rq->cpu_load[0] (include blocked task contrib.). That would >> potentially require redoing the magic constants in the menu governor's >> performance_multiplier...so for now, use rq->cpu_load[0] instead to >> preserve old behaviour. > > Works for me I suppose, but I would indeed encourage people to > investigate better options. Thanks for the review :) -Sai -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, September 23, 2016 03:44:23 PM Sai Gurrappadi wrote: > > On 09/23/2016 03:06 PM, Peter Zijlstra wrote:> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote: > >> When triaging a performance degradation of ~5% on some use cases between > >> our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4 > >> was way more aggressive when requesting for deeper idle states. It would > >> often get it wrong though resulting in perf loss. > >> > >> The menu governor tries to bias picking shallower idle states based on the > >> historical load on the CPU. The busier the CPU, the shallower the idle > >> state. > >> > >> However, after commit "3289bdb sched: Move the loadavg code to a more > > > > The normal quoting style is: > > > > 3289bdb42988 ("sched: Move the loadavg code to a more obvious location") > > > > Use at least 12 SHA1 characters, collisions on 7-8 char abbrevs have > > already happened. > > > > Also, that was more than a year ago, nobody noticed? > > Ah, sorry. Will do next time. > > > > >> obvious location", the load metric it looks at is rq->load.weight which is > >> the instantaneous se->load.weight sum for top level entities on the rq > >> which on idle entry is always 0 (for the common case at least) because > >> there is nothing on the cfs rq. > >> > >> The previous metric the menu governor used was rq->cpu_load[0] which is a > >> snap shot of the weighted_cpuload at the previous load update point so it > >> isn't always 0 on idle entry. > > > > Right, basically a 'random' number :) > > Indeed. I never really understood how things worked with the cpu_load stuff > given how random it seemed. Well, the choice seems to be between "better performance, but we don't really know how we get that" and "more idle and we understand how it works". Honestly, I prefer to understand how it works in the first place. Of course, the fact that the metric used currently is (almost) always 0 is a problem, but it doesn't seem like going back to the old one would be a huge improvement either. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 09/29/2016 06:22 AM, Rafael J. Wysocki wrote: > On Friday, September 23, 2016 03:44:23 PM Sai Gurrappadi wrote: >> >> On 09/23/2016 03:06 PM, Peter Zijlstra wrote: >>>> On Fri, Sep 23, 2016 at 02:49:47PM -0700, Sai Gurrappadi wrote: ... >>>> The previous metric the menu governor used was rq->cpu_load[0] which is a >>>> snap shot of the weighted_cpuload at the previous load update point so it >>>> isn't always 0 on idle entry. >>> >>> Right, basically a 'random' number :) >> >> Indeed. I never really understood how things worked with the cpu_load stuff >> given how random it seemed. > > Well, the choice seems to be between "better performance, but we don't really > know how we get that" and "more idle and we understand how it works". > > Honestly, I prefer to understand how it works in the first place. A busier CPU wants to enter shallower idle states because the cost of entering deeper idle states (exit latency wise) is bigger. The performance_multiplier stuff in the menu governor tries to map a cpu load metric -> exit latency tolerance by tweaking a fudge factor. This idea at least makes sense. The problem is that the cpu load metric it used (rq->cpu_load[0]) isn't the best thing for this purpose because it is highly dependent on how the sched tick aligns with the workload. We found that not considering CPU load in the menu governor did result in worse idle state prediction resulting in performance loss due to the higher exit latency of the deeper idle states. > > Of course, the fact that the metric used currently is (almost) always 0 is a > problem, but it doesn't seem like going back to the old one would be a huge > improvement either. Yup, I agree. Ideally we redo the performance_multiplier stuff in the menu governor to use a better metric (rq->cfs.avg.load_avg?) or maybe we go address why the predictor fails in the first place in a different manner. Do note that this stuff was using rq->cpu_load when it still used rq->load.weight as the input metric. This was switched once we added the PELT metric but the fudge factors in performance_multiplier haven't changed so that in itself might be a good enough reason to redo it.. Thanks, -Sai ----------------------------------------------------------------------------------- This email message is for the sole use of the intended recipient(s) and may contain confidential information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message. ----------------------------------------------------------------------------------- -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 44817c6..d1aea12 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -2955,7 +2955,7 @@ void get_iowait_load(unsigned long *nr_waiters, unsigned long *load) { struct rq *rq = this_rq(); *nr_waiters = atomic_read(&rq->nr_iowait); - *load = rq->load.weight; + *load = rq->cpu_load[0]; } #ifdef CONFIG_SMP
When triaging a performance degradation of ~5% on some use cases between our k3.18 and k4.4 trees, we found that the menu cpuidle governor on k4.4 was way more aggressive when requesting for deeper idle states. It would often get it wrong though resulting in perf loss. The menu governor tries to bias picking shallower idle states based on the historical load on the CPU. The busier the CPU, the shallower the idle state. However, after commit "3289bdb sched: Move the loadavg code to a more obvious location", the load metric it looks at is rq->load.weight which is the instantaneous se->load.weight sum for top level entities on the rq which on idle entry is always 0 (for the common case at least) because there is nothing on the cfs rq. The previous metric the menu governor used was rq->cpu_load[0] which is a snap shot of the weighted_cpuload at the previous load update point so it isn't always 0 on idle entry. Unfortunately, it isn't straightforward to switch the metric being used to rq->cfs.load_avg or rq->cfs.util_avg because they overestimate the load a lot more than rq->cpu_load[0] (include blocked task contrib.). That would potentially require redoing the magic constants in the menu governor's performance_multiplier...so for now, use rq->cpu_load[0] instead to preserve old behaviour. Reported-by: Juha Lainema <jlainema@nvidia.com> Signed-off-by: Sai Gurrappadi <sgurrappadi@nvidia.com> --- * I realize this might not be the best thing to do hence the RFC tag. Thoughts? kernel/sched/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)