Message ID | 20230728145515.990749537@infradead.org (mailing list archive) |
---|---|
Headers | show |
Series | cpuidle,teo: Improve TEO tick decisions | expand |
Hi, On Fri, 28 Jul 2023, Peter Zijlstra wrote: > Hi, > > Wanted to send this yesterday, but my home server died and took everything > down :/ > > These patches are lightly tested but appear to behave as expected. > > As I was asked to see if the patches of Raphael improve the behavior, I rerun the tests with Raphaels v2 as well as with Peters RFC patchset. Here are the results (compared to upstream): upstream raphael v2 peter RFC Idle Total 2533 100.00% 1183 100.00% 5563 100.00% x >= 4ms 1458 57.56% 1151 97.30% 3385 60.85% 4ms> x >= 2ms 91 3.59% 12 1.01% 133 2.39% 2ms > x >= 1ms 56 2.21% 3 0.25% 80 1.44% 1ms > x >= 500us 64 2.53% 1 0.08% 98 1.76% 500us > x >= 250us 73 2.88% 0 0.00% 113 2.03% 250us > x >=100us 76 3.00% 2 0.17% 106 1.91% 100us > x >= 50us 33 1.30% 4 0.34% 75 1.35% 50us > x >= 25us 39 1.54% 4 0.34% 152 2.73% 25us > x >= 10us 199 7.86% 4 0.34% 404 7.26% 10us > x > 5us 156 6.16% 0 0.00% 477 8.57% 5us > x 288 11.37% 2 0.17% 540 9.71% tick_nohz_next_event() count 8839790 6142079 36623 Raphaels Approach still does the tick_nohz_get_sleep_length() execution unconditional. It executes ~5000 times more tick_nohz_next_event() as the tick is stopped. This relation improved massively in Peters approach (factor is ~7). Thanks, Anna-Maria
On Mon, Jul 31, 2023 at 1:47 PM Anna-Maria Behnsen <anna-maria@linutronix.de> wrote: > > Hi, > > On Fri, 28 Jul 2023, Peter Zijlstra wrote: > > > Hi, > > > > Wanted to send this yesterday, but my home server died and took everything > > down :/ > > > > These patches are lightly tested but appear to behave as expected. > > > > > > As I was asked to see if the patches of Raphael improve the behavior, I > rerun the tests with Raphaels v2 as well as with Peters RFC patchset. Here > are the results (compared to upstream): Thanks for the numbers! > upstream raphael v2 peter RFC > > Idle Total 2533 100.00% 1183 100.00% 5563 100.00% > x >= 4ms 1458 57.56% 1151 97.30% 3385 60.85% > 4ms> x >= 2ms 91 3.59% 12 1.01% 133 2.39% > 2ms > x >= 1ms 56 2.21% 3 0.25% 80 1.44% > 1ms > x >= 500us 64 2.53% 1 0.08% 98 1.76% > 500us > x >= 250us 73 2.88% 0 0.00% 113 2.03% > 250us > x >=100us 76 3.00% 2 0.17% 106 1.91% > 100us > x >= 50us 33 1.30% 4 0.34% 75 1.35% > 50us > x >= 25us 39 1.54% 4 0.34% 152 2.73% > 25us > x >= 10us 199 7.86% 4 0.34% 404 7.26% > 10us > x > 5us 156 6.16% 0 0.00% 477 8.57% > 5us > x 288 11.37% 2 0.17% 540 9.71% > > tick_nohz_next_event() > count 8839790 6142079 36623 > > Raphaels Approach still does the tick_nohz_get_sleep_length() execution > unconditional. It executes ~5000 times more tick_nohz_next_event() as the > tick is stopped. This relation improved massively in Peters approach > (factor is ~7). So I'm drawing slightly different conclusions from the above. First, because there are different numbers of idle cycles in each case, I think it only makes sense to look at the percentages. So in both the "upstream" and "Peter RFC" cases there is a significant percentage of times when the tick was stopped and the measure idle duration was below 25 us (25.39% and 25.54%, respectively), whereas in the "Rafael v2" case that is only 0.51% of times (not even 1%). This means a huge improvement to me, because all of these cases mean that the governor incorrectly told the idle loop to stop the tick (it must have selected a shallow state and so it should have not stopped the tick in those cases). To me, this clearly means fixing a real bug in the governor. Now, I said in the changelog of my v1 that the goal was not to reduce the number of tick_nohz_get_sleep_length() invocations, so I'm not sure why it is regarded as a comparison criteria.
On Mon, 31 Jul 2023, Rafael J. Wysocki wrote: > On Mon, Jul 31, 2023 at 1:47 PM Anna-Maria Behnsen > <anna-maria@linutronix.de> wrote: > > > > Hi, > > > > On Fri, 28 Jul 2023, Peter Zijlstra wrote: > > > > > Hi, > > > > > > Wanted to send this yesterday, but my home server died and took everything > > > down :/ > > > > > > These patches are lightly tested but appear to behave as expected. > > > > > > > > > > As I was asked to see if the patches of Raphael improve the behavior, I > > rerun the tests with Raphaels v2 as well as with Peters RFC patchset. Here > > are the results (compared to upstream): > > Thanks for the numbers! > > > upstream raphael v2 peter RFC > > > > Idle Total 2533 100.00% 1183 100.00% 5563 100.00% > > x >= 4ms 1458 57.56% 1151 97.30% 3385 60.85% > > 4ms> x >= 2ms 91 3.59% 12 1.01% 133 2.39% > > 2ms > x >= 1ms 56 2.21% 3 0.25% 80 1.44% > > 1ms > x >= 500us 64 2.53% 1 0.08% 98 1.76% > > 500us > x >= 250us 73 2.88% 0 0.00% 113 2.03% > > 250us > x >=100us 76 3.00% 2 0.17% 106 1.91% > > 100us > x >= 50us 33 1.30% 4 0.34% 75 1.35% > > 50us > x >= 25us 39 1.54% 4 0.34% 152 2.73% > > 25us > x >= 10us 199 7.86% 4 0.34% 404 7.26% > > 10us > x > 5us 156 6.16% 0 0.00% 477 8.57% > > 5us > x 288 11.37% 2 0.17% 540 9.71% > > > > tick_nohz_next_event() > > count 8839790 6142079 36623 > > > > Raphaels Approach still does the tick_nohz_get_sleep_length() execution > > unconditional. It executes ~5000 times more tick_nohz_next_event() as the > > tick is stopped. This relation improved massively in Peters approach > > (factor is ~7). > > So I'm drawing slightly different conclusions from the above. > > First, because there are different numbers of idle cycles in each > case, I think it only makes sense to look at the percentages. > > So in both the "upstream" and "Peter RFC" cases there is a significant > percentage of times when the tick was stopped and the measure idle > duration was below 25 us (25.39% and 25.54%, respectively), whereas in > the "Rafael v2" case that is only 0.51% of times (not even 1%). This > means a huge improvement to me, because all of these cases mean that > the governor incorrectly told the idle loop to stop the tick (it must > have selected a shallow state and so it should have not stopped the > tick in those cases). To me, this clearly means fixing a real bug in > the governor. > > Now, I said in the changelog of my v1 that the goal was not to reduce > the number of tick_nohz_get_sleep_length() invocations, so I'm not > sure why it is regarded as a comparison criteria. > I just mentioned this relation as the percentage values are obvious. I know, your approach does not adress the tick_nohz_get_sleep_length invocations, but it might be worth a try - I saw your new patchset adressing it and will have a deeper look at it today/tomorrow. Thanks! I also try to play around with the timer logics (moving marking timer base idle into tick_nohz_stop_tick), but the results does not look pretty good at the moment.