diff mbox series

[PATCHv2,3/3] cpuidle: teo: Don't count non-existent intercepts

Message ID 20240611112413.1241352-4-christian.loehle@arm.com (mailing list archive)
State Superseded, archived
Headers show
Series cpuidle: teo: Fixing utilization and intercept logic | expand

Commit Message

Christian Loehle June 11, 2024, 11:24 a.m. UTC
When bailing out early, teo will not query the sleep length anymore
since commit 6da8f9ba5a87 ("cpuidle: teo:
Skip tick_nohz_get_sleep_length() call in some cases") with an
expected sleep_length_ns value of KTIME_MAX.
This lead to state0 accumulating lots of 'intercepts' because
the actually measured sleep length was < KTIME_MAX, so count KTIME_MAX
as a hit (we have to count them as something otherwise we are stuck)
and therefore teo taking too long to recover from intercept-heavy
scenarios.

Fundamentally we can only do one of the two:
1. Skip sleep_length_ns query when we think intercept is likely
2. Have accurate data if sleep_length_ns is actually intercepted when
we believe it is currently intercepted.

This patch chooses the latter as I've found the additional time it
takes to query the sleep length to be negligible and the variants of
option 1 (count all unknowns as misses or count all unknown as hits)
had significant regressions (as misses had lots of too shallow idle
state selections and as hits had terrible performance in
intercept-heavy workloads).

Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
Signed-off-by: Christian Loehle <christian.loehle@arm.com>
---
 drivers/cpuidle/governors/teo.c | 18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

Comments

Dietmar Eggemann June 26, 2024, 1:09 p.m. UTC | #1
On 11/06/2024 13:24, Christian Loehle wrote:
> When bailing out early, teo will not query the sleep length anymore
> since commit 6da8f9ba5a87 ("cpuidle: teo:
> Skip tick_nohz_get_sleep_length() call in some cases") with an
> expected sleep_length_ns value of KTIME_MAX.
> This lead to state0 accumulating lots of 'intercepts' because
> the actually measured sleep length was < KTIME_MAX, so count KTIME_MAX
> as a hit (we have to count them as something otherwise we are stuck)
> and therefore teo taking too long to recover from intercept-heavy
> scenarios.
> 
> Fundamentally we can only do one of the two:
> 1. Skip sleep_length_ns query when we think intercept is likely

Isn't this what we do right now? Skip tick_nohz_get_sleep_length() for
state0 and set cpu_data->sleep_length_ns to KTIME_MAX in teo_select()
and then count this as an 'intercept' in teo_update().

> 2. Have accurate data if sleep_length_ns is actually intercepted when
> we believe it is currently intercepted.
> 
> This patch chooses the latter as I've found the additional time it
> takes to query the sleep length to be negligible and the variants of
> option 1 (count all unknowns as misses or count all unknown as hits)
> had significant regressions (as misses had lots of too shallow idle
> state selections and as hits had terrible performance in
> intercept-heavy workloads).

IMHO, you do 2 things here:

(1) Set 'cpu_data->sleep_length_ns != KTIME_MAX' for '!idx &&
    prev_intercept_idx' in teo_select().

(2) Force an update with 'cpu_data->sleep_length_ns == KTIME_MAX' to be
    counted as a 'hit' rather an 'intercept' in teo_update().

Can't really see how this matches the explanatory text exactly.


> 
> Fixes: 6da8f9ba5a87 ("cpuidle: teo: Skip tick_nohz_get_sleep_length() call in some cases")
> Signed-off-by: Christian Loehle <christian.loehle@arm.com>
> ---
>  drivers/cpuidle/governors/teo.c | 18 ++++++++++++++++--
>  1 file changed, 16 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
> index cc7df59f488d..1e4b40474f49 100644
> --- a/drivers/cpuidle/governors/teo.c
> +++ b/drivers/cpuidle/governors/teo.c
> @@ -231,8 +231,13 @@ static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
>  	 * length, this is a "hit", so update the "hits" metric for that bin.
>  	 * Otherwise, update the "intercepts" metric for the bin fallen into by
>  	 * the measured idle duration.
> +	 * If teo_select() bailed out early, we have to count this as a hit as
> +	 * we don't know what the true sleep length would've been. Otherwise
> +	 * we accumulate lots of intercepts at the shallower state (especially
> +	 * state0) even though there weren't any intercepts due to us
> +	 * anticipating one.
>  	 */
> -	if (idx_timer == idx_duration)
> +	if (idx_timer == idx_duration || cpu_data->sleep_length_ns == KTIME_MAX)
>  		cpu_data->state_bins[idx_timer].hits += PULSE;
>  	else
>  		cpu_data->state_bins[idx_duration].intercepts += PULSE;
> @@ -292,7 +297,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  	unsigned int hit_sum = 0;
>  	int constraint_idx = 0;
>  	int idx0 = 0, idx = -1;
> -	bool alt_intercepts, alt_recent;
> +	int prev_intercept_idx;
>  	s64 duration_ns;
>  	int i;
>  
> @@ -370,6 +375,7 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  	 * all of the deeper states a shallower idle state is likely to be a
>  	 * better choice.
>  	 */
> +	prev_intercept_idx = idx;
>  	if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
>  		int first_suitable_idx = idx;
>  
> @@ -421,6 +427,14 @@ static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>  			first_suitable_idx = i;
>  		}
>  	}
> +	if (!idx && prev_intercept_idx) {
> +		/*
> +		 * We have to query the sleep length here otherwise we don't
> +		 * know after wakeup if our guess was correct.
> +		 */
> +		duration_ns = tick_nohz_get_sleep_length(&delta_tick);
> +		cpu_data->sleep_length_ns = duration_ns;
> +	}
>  
>  	/*
>  	 * If there is a latency constraint, it may be necessary to select an
Christian Loehle June 28, 2024, 9:33 a.m. UTC | #2
On 6/26/24 14:09, Dietmar Eggemann wrote:
> On 11/06/2024 13:24, Christian Loehle wrote:
>> When bailing out early, teo will not query the sleep length anymore
>> since commit 6da8f9ba5a87 ("cpuidle: teo:
>> Skip tick_nohz_get_sleep_length() call in some cases") with an
>> expected sleep_length_ns value of KTIME_MAX.
>> This lead to state0 accumulating lots of 'intercepts' because
>> the actually measured sleep length was < KTIME_MAX, so count KTIME_MAX
>> as a hit (we have to count them as something otherwise we are stuck)
>> and therefore teo taking too long to recover from intercept-heavy
>> scenarios.
>>
>> Fundamentally we can only do one of the two:
>> 1. Skip sleep_length_ns query when we think intercept is likely
> 
> Isn't this what we do right now? Skip tick_nohz_get_sleep_length() for
> state0 and set cpu_data->sleep_length_ns to KTIME_MAX in teo_select()
> and then count this as an 'intercept' in teo_update().

Yes it is, I'll state that more explicitly in the future, it was
kind of implied by "we can only do one of the two" and "This patch
chooses the latter".

> 
>> 2. Have accurate data if sleep_length_ns is actually intercepted when
>> we believe it is currently intercepted.
>>
>> This patch chooses the latter as I've found the additional time it
>> takes to query the sleep length to be negligible and the variants of
>> option 1 (count all unknowns as misses or count all unknown as hits)
>> had significant regressions (as misses had lots of too shallow idle
>> state selections and as hits had terrible performance in
>> intercept-heavy workloads).
> 
> IMHO, you do 2 things here:
> 
> (1) Set 'cpu_data->sleep_length_ns != KTIME_MAX' for '!idx &&
>     prev_intercept_idx' in teo_select().

This is just the technical way of saying "We query the sleep length
even though teo is in an intercept scenario (and don't use the sleep
length for determining the current idle state selection)".

> 
> (2) Force an update with 'cpu_data->sleep_length_ns == KTIME_MAX' to be
>     counted as a 'hit' rather an 'intercept' in teo_update().

I assume this is the one you have an issue with and you're correct.
This isn't quite obvious from the text.
Furthermore the KTIME_MAX won't be set for intercept-bailouts anyway
with (1) and thus doesn't become that important in terms of resulting
behavior. I still think (2) is the semantically correct thing to do,
but I'll drop it for the fixes series and add it to the remaining
'improvements' pile.
Thanks!

> 
> Can't really see how this matches the explanatory text exactly.
> [SNIP]
diff mbox series

Patch

diff --git a/drivers/cpuidle/governors/teo.c b/drivers/cpuidle/governors/teo.c
index cc7df59f488d..1e4b40474f49 100644
--- a/drivers/cpuidle/governors/teo.c
+++ b/drivers/cpuidle/governors/teo.c
@@ -231,8 +231,13 @@  static void teo_update(struct cpuidle_driver *drv, struct cpuidle_device *dev)
 	 * length, this is a "hit", so update the "hits" metric for that bin.
 	 * Otherwise, update the "intercepts" metric for the bin fallen into by
 	 * the measured idle duration.
+	 * If teo_select() bailed out early, we have to count this as a hit as
+	 * we don't know what the true sleep length would've been. Otherwise
+	 * we accumulate lots of intercepts at the shallower state (especially
+	 * state0) even though there weren't any intercepts due to us
+	 * anticipating one.
 	 */
-	if (idx_timer == idx_duration)
+	if (idx_timer == idx_duration || cpu_data->sleep_length_ns == KTIME_MAX)
 		cpu_data->state_bins[idx_timer].hits += PULSE;
 	else
 		cpu_data->state_bins[idx_duration].intercepts += PULSE;
@@ -292,7 +297,7 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	unsigned int hit_sum = 0;
 	int constraint_idx = 0;
 	int idx0 = 0, idx = -1;
-	bool alt_intercepts, alt_recent;
+	int prev_intercept_idx;
 	s64 duration_ns;
 	int i;
 
@@ -370,6 +375,7 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 	 * all of the deeper states a shallower idle state is likely to be a
 	 * better choice.
 	 */
+	prev_intercept_idx = idx;
 	if (2 * idx_intercept_sum > cpu_data->total - idx_hit_sum) {
 		int first_suitable_idx = idx;
 
@@ -421,6 +427,14 @@  static int teo_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
 			first_suitable_idx = i;
 		}
 	}
+	if (!idx && prev_intercept_idx) {
+		/*
+		 * We have to query the sleep length here otherwise we don't
+		 * know after wakeup if our guess was correct.
+		 */
+		duration_ns = tick_nohz_get_sleep_length(&delta_tick);
+		cpu_data->sleep_length_ns = duration_ns;
+	}
 
 	/*
 	 * If there is a latency constraint, it may be necessary to select an