Message ID | 20240809073120.250974-2-aboorvad@linux.ibm.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | cpuidle/menu: Address performance drop from favoring physical over polling cpuidle state | expand |
On 8/9/24 08:31, Aboorva Devarajan wrote: > Update the cpuidle menu governor to avoid prioritizing physical states > over polling states when predicted idle duration is lesser than the > physical states target residency duration for performance gains. > > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> > --- > drivers/cpuidle/governors/menu.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > index f3c9d49f0f2a..cf99ca103f9b 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, > idx = i; /* first enabled state */ > > if (s->target_residency_ns > predicted_ns) { > - /* > - * Use a physical idle state, not busy polling, unless > - * a timer is going to trigger soon enough. > - */ > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && > - s->exit_latency_ns <= latency_req && > - s->target_residency_ns <= data->next_timer_ns) { > - predicted_ns = s->target_residency_ns; > - idx = i; > - break; > - } > if (predicted_ns < TICK_NSEC) > break; > How about this? data->next_timer_ns is set to KTIME_MAX in case the last few intervals were very short, which might be the case for you. -->8-- diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f3c9d49f0f2a..77b40201c446 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -360,6 +360,7 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, */ if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && s->exit_latency_ns <= latency_req && + data->next_timer_ns != KTIME_MAX && s->target_residency_ns <= data->next_timer_ns) { predicted_ns = s->target_residency_ns; idx = i;
On 8/9/24 08:31, Aboorva Devarajan wrote: > Update the cpuidle menu governor to avoid prioritizing physical states > over polling states when predicted idle duration is lesser than the > physical states target residency duration for performance gains. I would use something like this as wording: [PATCH] cpuidle: menu: Select polling on short predicted idle Select a shallow state matching our predicted_ns even if it is a polling one. Additionally to querying the next timer event, menu also employs an interval tracking strategy of most recent idle durations for workloads that aren't woken up by predictable timer events. The logic predicts the next wakeup as predicted_ns, but the logic handling that skipped any polling states. In the worst-case on POWER we might only have snooze as polling and CEDE. This makes the entire logic around it a NOP as it never let's predicted_ns choose an appropriate idle state since it requires a non-polling one. To actually enforce predicted_ns for idle state selection actually use states even though they are polling ones. Even for a perfect recent intervals of [1, 1, 1, 1, 1, 1, 1, 1] menu previously chose the first non-idle state. ----- To mitigate potential side-effects since most platforms have a shallower idle state we could add something based on that, but really your patch would be the only consistent IMO. I'm not quite sure why this was put there in the first place. It was essentially introduced in commit ("69d25870f20c cpuidle: fix the menu governor to boost IO performance") with a 20us lower limit. > > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> > --- > drivers/cpuidle/governors/menu.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > index f3c9d49f0f2a..cf99ca103f9b 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, > idx = i; /* first enabled state */ > > if (s->target_residency_ns > predicted_ns) { > - /* > - * Use a physical idle state, not busy polling, unless > - * a timer is going to trigger soon enough. > - */ > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && > - s->exit_latency_ns <= latency_req && > - s->target_residency_ns <= data->next_timer_ns) { > - predicted_ns = s->target_residency_ns; > - idx = i; > - break; > - } > if (predicted_ns < TICK_NSEC) > break; >
On Fri, Aug 09, 2024 at 01:01:20PM GMT, Aboorva Devarajan wrote: > Update the cpuidle menu governor to avoid prioritizing physical states > over polling states when predicted idle duration is lesser than the > physical states target residency duration for performance gains. > > Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> > --- > drivers/cpuidle/governors/menu.c | 11 ----------- > 1 file changed, 11 deletions(-) > > diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c > index f3c9d49f0f2a..cf99ca103f9b 100644 > --- a/drivers/cpuidle/governors/menu.c > +++ b/drivers/cpuidle/governors/menu.c > @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, > idx = i; /* first enabled state */ > > if (s->target_residency_ns > predicted_ns) { > - /* > - * Use a physical idle state, not busy polling, unless > - * a timer is going to trigger soon enough. > - */ > - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && > - s->exit_latency_ns <= latency_req && > - s->target_residency_ns <= data->next_timer_ns) { > - predicted_ns = s->target_residency_ns; > - idx = i; > - break; > - } > if (predicted_ns < TICK_NSEC) > break; > > -- > 2.39.3 > I tried to identify the number of times we enter the snooze and CEDE states on a pseries machine with and without this patch applied while running the daytrader database benchmark[1]. The results show that the number of times we enter CEDE goes down considerably: // Current upstream 6.11 | STATE | NO. OF TIMES ENTERED | TOTAL TIME SPENT IN STATE (US) | AVG TIME SPENT IN STATE (US) | |Snooze | 6,389 | 221,592 | 34.68 | | CEDE | 160,368 | 13,288,855 | 82.86 | // Current upstream 6.11 - with above patch applied | STATE | NO. OF TIMES ENTERED | TOTAL TIME SPENT IN STATE (US) | AVG TIME SPENT IN STATE (US) | |Snooze | 140,267 | 6,400,073 | 45.63 | | CEDE | 50,884 | 6,553,641 | 128.80 | Above data was gathered with the help of power:cpu_idle tracepoint, and was recorded for 45 secs while the daytrader benchmark was running. [1] : https://github.com/WASdev/sample.daytrader7 Thanks, Gautam
diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c index f3c9d49f0f2a..cf99ca103f9b 100644 --- a/drivers/cpuidle/governors/menu.c +++ b/drivers/cpuidle/governors/menu.c @@ -354,17 +354,6 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev, idx = i; /* first enabled state */ if (s->target_residency_ns > predicted_ns) { - /* - * Use a physical idle state, not busy polling, unless - * a timer is going to trigger soon enough. - */ - if ((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) && - s->exit_latency_ns <= latency_req && - s->target_residency_ns <= data->next_timer_ns) { - predicted_ns = s->target_residency_ns; - idx = i; - break; - } if (predicted_ns < TICK_NSEC) break;
Update the cpuidle menu governor to avoid prioritizing physical states over polling states when predicted idle duration is lesser than the physical states target residency duration for performance gains. Signed-off-by: Aboorva Devarajan <aboorvad@linux.ibm.com> --- drivers/cpuidle/governors/menu.c | 11 ----------- 1 file changed, 11 deletions(-)