From patchwork Fri Jul 28 14:55:16 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 13332032 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9AF49C001DE for ; Fri, 28 Jul 2023 15:01:16 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237160AbjG1PBP (ORCPT ); Fri, 28 Jul 2023 11:01:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40026 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235779AbjG1PBN (ORCPT ); Fri, 28 Jul 2023 11:01:13 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2E5D72688; Fri, 28 Jul 2023 08:01:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=/jk8cjAzNp5umM9o/8ZP9pvl8gboqgF2w9reMo9nXqk=; b=bKnvshZ/ndNOvwq1Tklq1Cz0FF vvHXxiHIW2w98Vm4G3ioNeylieIiFQhLuNgDolJXkC5pruqePdPgiqhpDO3JIWxFqkMxXeArZG9Py m6TkobUM+ooLjFSoHvZV61SiHLI1NzMFinHiIb7pS9Fc4MLXZSKGoCTyDhtfFXbneY3K+0bgyFfS/ NGN5yHTSiwzJM5y/Pb5xe4EfMkXU7+O2ZsQ0nYzjLoUkbe2Zbov0O3K/T5AukAn0YyPeZotIW2alO Vs24fwevy7DKg6nsi1LvetldnCjRapaPzGi5QF+pfwrh8qu226UUksEiPjXrRL/s5ZJreTYGVd7Le jFUDTe1w==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qPOxY-008ar1-OS; Fri, 28 Jul 2023 15:00:53 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 5A5D63003A2; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 209E92C8FCA94; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Message-ID: <20230728145808.835742568@infradead.org> User-Agent: quilt/0.66 Date: Fri, 28 Jul 2023 16:55:16 +0200 From: Peter Zijlstra To: anna-maria@linutronix.de, rafael@kernel.org, tglx@linutronix.de, frederic@kernel.org, gautham.shenoy@amd.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, daniel.lezcano@linaro.org, linux-pm@vger.kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com Subject: [RFC][PATCH 1/3] cpuidle: Inject tick boundary state References: <20230728145515.990749537@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org In order to facilitate governors that track history in idle-state buckets (TEO) making a useful decision about NOHZ, make sure we have a bucket that counts tick-and-longer. In order to be inclusive of the tick itself -- after all, if we do not disable NOHZ we'll sleep for a full tick, the actual boundary should be just short of a full tick. IOW, when registering the idle-states, add one that is always disabled, just to have a bucket. Signed-off-by: Peter Zijlstra (Intel) --- drivers/cpuidle/cpuidle.h | 2 + drivers/cpuidle/driver.c | 48 +++++++++++++++++++++++++++++++++++++++++++++- include/linux/cpuidle.h | 2 - 3 files changed, 50 insertions(+), 2 deletions(-) --- a/drivers/cpuidle/cpuidle.h +++ b/drivers/cpuidle/cpuidle.h @@ -72,4 +72,6 @@ static inline void cpuidle_coupled_unreg } #endif +#define SHORT_TICK_NSEC (TICK_NSEC - TICK_NSEC/32) + #endif /* __DRIVER_CPUIDLE_H */ --- a/drivers/cpuidle/driver.c +++ b/drivers/cpuidle/driver.c @@ -147,13 +147,37 @@ static void cpuidle_setup_broadcast_time tick_broadcast_disable(); } +static int tick_enter(struct cpuidle_device *dev, + struct cpuidle_driver *drv, + int index) +{ + return -ENODEV; +} + +static void __cpuidle_state_init_tick(struct cpuidle_state *s) +{ + strcpy(s->name, "TICK"); + strcpy(s->desc, "(no-op)"); + + s->target_residency_ns = SHORT_TICK_NSEC; + s->target_residency = div_u64(SHORT_TICK_NSEC, NSEC_PER_USEC); + + s->exit_latency_ns = 0; + s->exit_latency = 0; + + s->flags |= CPUIDLE_FLAG_UNUSABLE; + + s->enter = tick_enter; + s->enter_s2idle = tick_enter; +} + /** * __cpuidle_driver_init - initialize the driver's internal data * @drv: a valid pointer to a struct cpuidle_driver */ static void __cpuidle_driver_init(struct cpuidle_driver *drv) { - int i; + int tick = 0, i; /* * Use all possible CPUs as the default, because if the kernel boots @@ -163,6 +187,9 @@ static void __cpuidle_driver_init(struct if (!drv->cpumask) drv->cpumask = (struct cpumask *)cpu_possible_mask; + if (WARN_ON_ONCE(drv->state_count >= CPUIDLE_STATE_MAX-2)) + tick = 1; + for (i = 0; i < drv->state_count; i++) { struct cpuidle_state *s = &drv->states[i]; @@ -192,6 +219,25 @@ static void __cpuidle_driver_init(struct s->exit_latency_ns = 0; else s->exit_latency = div_u64(s->exit_latency_ns, NSEC_PER_USEC); + + if (!tick && s->target_residency_ns >= SHORT_TICK_NSEC) { + tick = 1; + + if (s->target_residency_ns == SHORT_TICK_NSEC) + continue; + + memmove(&drv->states[i+1], &drv->states[i], + sizeof(struct cpuidle_state) * (CPUIDLE_STATE_MAX - i - 1)); + __cpuidle_state_init_tick(s); + drv->state_count++; + i++; + } + } + + if (!tick) { + struct cpuidle_state *s = &drv->states[i]; + __cpuidle_state_init_tick(s); + drv->state_count++; } } --- a/include/linux/cpuidle.h +++ b/include/linux/cpuidle.h @@ -16,7 +16,7 @@ #include #include -#define CPUIDLE_STATE_MAX 10 +#define CPUIDLE_STATE_MAX 16 #define CPUIDLE_NAME_LEN 16 #define CPUIDLE_DESC_LEN 32 From patchwork Fri Jul 28 14:55:17 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 13332031 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 33551C0015E for ; Fri, 28 Jul 2023 15:01:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S236950AbjG1PBO (ORCPT ); Fri, 28 Jul 2023 11:01:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40024 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231314AbjG1PBM (ORCPT ); Fri, 28 Jul 2023 11:01:12 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 156452115; Fri, 28 Jul 2023 08:01:11 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=+Z8qBsQzscQ1mZHZVoaPDJfcq+No0YKyjdsder5ud9I=; b=W6oXjZ0aoO2yeDXfHR9eMXbO+u y6WhZVFJfLLShQxstyvRdHuQFwpGGFr1lzEFzmp8Viq9u2D+bJD03uZnSDxbqYk0zjRXli/f82YVo NVDGB4YEFXiaY9ioq2AtGBj6uxKPsigidnCMlsFNpsUqrZUMf7Qjti6ImiRnqFNotrc18sFYetz7+ tAS9+N+rgvWXs4ASmOM9dMdQLldVUal+3ufTnGvlsbSURAZlfLaQCUA5fEgiCiT0DaC7nuUkDON+C ImKVn1td048fSTiINjIyTtv6/ifByiQLVLMbWGqmyjbvFTnNx7yOJqELMDl1tr48dl0xn8RhiNMbi lXcFOlGg==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qPOxY-008ar0-NX; Fri, 28 Jul 2023 15:00:53 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 5D154308CC4; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 235E12C8FCA8C; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Message-ID: <20230728145808.902892871@infradead.org> User-Agent: quilt/0.66 Date: Fri, 28 Jul 2023 16:55:17 +0200 From: Peter Zijlstra To: anna-maria@linutronix.de, rafael@kernel.org, tglx@linutronix.de, frederic@kernel.org, gautham.shenoy@amd.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, daniel.lezcano@linaro.org, linux-pm@vger.kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com Subject: [RFC][PATCH 2/3] cpuidle,teo: Improve NOHZ management References: <20230728145515.990749537@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org With cpuidle having added a TICK bucket, TEO will account all TICK and longer idles there. This means we can now make an informed decision about stopping the tick. If the sum of 'hit+intercepts' of all states below the TICK bucket is more than 50%, it is most likely we'll not reach the tick this time around either, so stopping the tick doesn't make sense. If we don't stop the tick, don't bother calling tick_nohz_get_sleep_length() and assume duration is no longer than a tick (could be improved to still look at the current pending time and timers). Since we have this extra state, remove the state_count based early decisions. Signed-off-by: Peter Zijlstra (Intel) --- drivers/cpuidle/governors/teo.c | 97 ++++++++++++++-------------------------- 1 file changed, 34 insertions(+), 63 deletions(-) --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -139,6 +139,7 @@ #include #include #include +#include "../cpuidle.h" /* * The number of bits to shift the CPU's capacity by in order to determine @@ -197,7 +198,6 @@ struct teo_cpu { int next_recent_idx; int recent_idx[NR_RECENT]; unsigned long util_threshold; - bool utilized; }; static DEFINE_PER_CPU(struct teo_cpu, teo_cpus); @@ -276,11 +276,11 @@ static void teo_update(struct cpuidle_dr cpu_data->total += bin->hits + bin->intercepts; - if (target_residency_ns <= cpu_data->sleep_length_ns) { + if (target_residency_ns <= cpu_data->sleep_length_ns) idx_timer = i; - if (target_residency_ns <= measured_ns) - idx_duration = i; - } + + if (target_residency_ns <= measured_ns) + idx_duration = i; } i = cpu_data->next_recent_idx++; @@ -362,11 +362,12 @@ static int teo_select(struct cpuidle_dri unsigned int recent_sum = 0; unsigned int idx_hit_sum = 0; unsigned int hit_sum = 0; + unsigned int tick_sum = 0; int constraint_idx = 0; int idx0 = 0, idx = -1; bool alt_intercepts, alt_recent; ktime_t delta_tick; - s64 duration_ns; + s64 duration_ns = TICK_NSEC; int i; if (dev->last_state_idx >= 0) { @@ -376,36 +377,26 @@ static int teo_select(struct cpuidle_dri cpu_data->time_span_ns = local_clock(); - duration_ns = tick_nohz_get_sleep_length(&delta_tick); - cpu_data->sleep_length_ns = duration_ns; + /* Should we stop the tick? */ + for (i = 1; i < drv->state_count; i++) { + struct teo_bin *prev_bin = &cpu_data->state_bins[i-1]; + struct cpuidle_state *s = &drv->states[i]; - /* Check if there is any choice in the first place. */ - if (drv->state_count < 2) { - idx = 0; - goto end; - } - if (!dev->states_usage[0].disable) { - idx = 0; - if (drv->states[1].target_residency_ns > duration_ns) - goto end; - } + tick_sum += prev_bin->intercepts; + tick_sum += prev_bin->hits; - cpu_data->utilized = teo_cpu_is_utilized(dev->cpu, cpu_data); - /* - * If the CPU is being utilized over the threshold and there are only 2 - * states to choose from, the metrics need not be considered, so choose - * the shallowest non-polling state and exit. - */ - if (drv->state_count < 3 && cpu_data->utilized) { - for (i = 0; i < drv->state_count; ++i) { - if (!dev->states_usage[i].disable && - !(drv->states[i].flags & CPUIDLE_FLAG_POLLING)) { - idx = i; - goto end; - } - } + if (s->target_residency_ns >= SHORT_TICK_NSEC) + break; } + if (2*tick_sum > cpu_data->total) + *stop_tick = false; + + /* If we do stop the tick, ask for the next timer. */ + if (*stop_tick) + duration_ns = tick_nohz_get_sleep_length(&delta_tick); + cpu_data->sleep_length_ns = duration_ns; + /* * Find the deepest idle state whose target residency does not exceed * the current sleep length and the deepest idle state not deeper than @@ -446,13 +437,13 @@ static int teo_select(struct cpuidle_dri idx_recent_sum = recent_sum; } - /* Avoid unnecessary overhead. */ - if (idx < 0) { - idx = 0; /* No states enabled, must use 0. */ - goto end; - } else if (idx == idx0) { - goto end; - } + /* No states enabled, must use 0 */ + if (idx < 0) + return 0; + + /* No point looking for something shallower than the first enabled state */ + if (idx == idx0) + return idx; /* * If the sum of the intercepts metric for all of the idle states @@ -541,29 +532,9 @@ static int teo_select(struct cpuidle_dri * If the CPU is being utilized over the threshold, choose a shallower * non-polling state to improve latency */ - if (cpu_data->utilized) + if (teo_cpu_is_utilized(dev->cpu, cpu_data)) idx = teo_find_shallower_state(drv, dev, idx, duration_ns, true); -end: - /* - * Don't stop the tick if the selected state is a polling one or if the - * expected idle duration is shorter than the tick period length. - */ - if (((drv->states[idx].flags & CPUIDLE_FLAG_POLLING) || - duration_ns < TICK_NSEC) && !tick_nohz_tick_stopped()) { - *stop_tick = false; - - /* - * The tick is not going to be stopped, so if the target - * residency of the state to be returned is not within the time - * till the closest timer including the tick, try to correct - * that. - */ - if (idx > idx0 && - drv->states[idx].target_residency_ns > delta_tick) - idx = teo_find_shallower_state(drv, dev, idx, delta_tick, false); - } - return idx; } From patchwork Fri Jul 28 14:55:18 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Zijlstra X-Patchwork-Id: 13332030 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id AF152C001DF for ; Fri, 28 Jul 2023 15:01:15 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S237105AbjG1PBO (ORCPT ); Fri, 28 Jul 2023 11:01:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:40022 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S235681AbjG1PBM (ORCPT ); Fri, 28 Jul 2023 11:01:12 -0400 Received: from casper.infradead.org (casper.infradead.org [IPv6:2001:8b0:10b:1236::1]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E95261FFF; Fri, 28 Jul 2023 08:01:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=infradead.org; s=casper.20170209; h=Content-Type:MIME-Version:References: Subject:Cc:To:From:Date:Message-ID:Sender:Reply-To:Content-Transfer-Encoding: Content-ID:Content-Description:In-Reply-To; bh=gRhD2DL6Ofjvx8HyJGzSmAjib8RktDFYPANPb4xvXOo=; b=mCkhvZeGlFt1058W/ij5hj0eV+ M9v9w7L774A6WoeeEWBGOXJwQqrSueynX/n1HidtSv7gWRvfkI/Kpf+GXmezXtMDbmth9XWL2SfuW bS3z0zgKDvV7ZoKuECyshz4tz5Wymm5FR8N9VNr9X8uXqogWa9muXJnyVjP3pTyOHfVhdo4MVExRf PinHa2JNB5LMpgHL66saoJJJ+T8mGpdL2BSFT3URra9FUP5SFVSxC269MYwutCX39pZoWemUPgnQC qaDvaxY4a/+iYyLGC81J3JzipnyqE0EI/UrEohzXmO5HZkT4kCGgLUSgWWYJrv0LgVymR8gf8BCcE +fOunw9Q==; Received: from j130084.upc-j.chello.nl ([24.132.130.84] helo=noisy.programming.kicks-ass.net) by casper.infradead.org with esmtpsa (Exim 4.94.2 #2 (Red Hat Linux)) id 1qPOxY-008ar2-Pr; Fri, 28 Jul 2023 15:00:53 +0000 Received: from hirez.programming.kicks-ass.net (hirez.programming.kicks-ass.net [192.168.1.225]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (4096 bits)) (Client did not present a certificate) by noisy.programming.kicks-ass.net (Postfix) with ESMTPS id 60DAE308CCC; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Received: by hirez.programming.kicks-ass.net (Postfix, from userid 0) id 2788E2C8FCA98; Fri, 28 Jul 2023 17:00:51 +0200 (CEST) Message-ID: <20230728145808.970594909@infradead.org> User-Agent: quilt/0.66 Date: Fri, 28 Jul 2023 16:55:18 +0200 From: Peter Zijlstra To: anna-maria@linutronix.de, rafael@kernel.org, tglx@linutronix.de, frederic@kernel.org, gautham.shenoy@amd.com Cc: linux-kernel@vger.kernel.org, peterz@infradead.org, daniel.lezcano@linaro.org, linux-pm@vger.kernel.org, mingo@redhat.com, juri.lelli@redhat.com, vincent.guittot@linaro.org, dietmar.eggemann@arm.com, rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de, bristot@redhat.com, vschneid@redhat.com Subject: [RFC][PATCH 3/3] cpuidle,teo: Improve state selection References: <20230728145515.990749537@infradead.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org When selecting a state, stop when history tells us 66% of recent idles were at or below our current state. Signed-off-by: Peter Zijlstra (Intel) --- drivers/cpuidle/governors/teo.c | 6 ++++++ 1 file changed, 6 insertions(+) --- a/drivers/cpuidle/governors/teo.c +++ b/drivers/cpuidle/governors/teo.c @@ -363,6 +363,7 @@ static int teo_select(struct cpuidle_dri unsigned int idx_hit_sum = 0; unsigned int hit_sum = 0; unsigned int tick_sum = 0; + unsigned int thresh_sum = 0; int constraint_idx = 0; int idx0 = 0, idx = -1; bool alt_intercepts, alt_recent; @@ -397,6 +398,8 @@ static int teo_select(struct cpuidle_dri duration_ns = tick_nohz_get_sleep_length(&delta_tick); cpu_data->sleep_length_ns = duration_ns; + thresh_sum = 2 * cpu_data->total / 3; /* 66% */ + /* * Find the deepest idle state whose target residency does not exceed * the current sleep length and the deepest idle state not deeper than @@ -427,6 +430,9 @@ static int teo_select(struct cpuidle_dri if (s->target_residency_ns > duration_ns) break; + if (intercept_sum + hit_sum > thresh_sum) + break; + idx = i; if (s->exit_latency_ns <= latency_req)