From patchwork Tue Nov 24 06:26:14 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 11927271 X-Patchwork-Delegate: daniel.lezcano@linaro.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 909EDC2D0E4 for ; Tue, 24 Nov 2020 06:26:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 255872076C for ; Tue, 24 Nov 2020 06:26:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="N+cINADS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728358AbgKXG00 (ORCPT ); Tue, 24 Nov 2020 01:26:26 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52774 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729304AbgKXG00 (ORCPT ); Tue, 24 Nov 2020 01:26:26 -0500 Received: from mail-pg1-x543.google.com (mail-pg1-x543.google.com [IPv6:2607:f8b0:4864:20::543]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 007DDC0613D6 for ; Mon, 23 Nov 2020 22:26:25 -0800 (PST) Received: by mail-pg1-x543.google.com with SMTP id 62so16539918pgg.12 for ; Mon, 23 Nov 2020 22:26:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=+97LgEscNYSrtbk3WYtj7uo7a8C61/n23cCotlpS30k=; b=N+cINADShT8YraSqlTdZta+rOmsbeQZDkjE937B0CTomLJn9rQY9tCPNqm0xIkpBbG oBs41DYIIrVRZ+Qy0l+FCxXmySQ4Xa1BddBgXehCcs76FAAtkfK4HH9sv2VvdItMx1vC BkABZ8OaPo6PZ+hlBdFZp+4CmdzVYGHFx2oHeOImqsuAm0Ghnd2lRlTd+okA7sRn74By u4ewCQxcB2GXUisBEvunMwBm2LWcxVTyjVw/Pf13vOVYr9kCxIN6y+ApNgnyFHdUf7Gk qrYHzQ0VfQQLEExJZQ9bHn3KXQ7tUc+Eh9QqHUDGZaC7BWTVTH2mzr+K7LQ6F9duj42e K+FQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+97LgEscNYSrtbk3WYtj7uo7a8C61/n23cCotlpS30k=; b=h/OEMqoNnS+NVSJ+bLyPxF0gTTetSTahUqe00FnzvIPTvNAGODpyEnWXtWmwXaaATC yVnrKF4LoGapl1RAHqa4ud5O5dVS385ODDwUo8kRSIbd5h6NQ3RC6g2vea6ZpebtdYzX foBa2RkQ08y6JRan6eAXuafzO2gHlY1OfaFBSXY1SeLha9NyIo3KO0dQLjPhBdDkxdD1 /SPiIzKaY/NWzduTYqp9WSr2ccAVx46HcOpb5bJc1XbikVWtwXrx+vdjqGJjYnRgJbWE iFzgghcb2pu0rV6P8XhiE51Tz1If/hYgcSICV+qhY/gJCxnqW9dgxwq8pKi5HVBIzGKt 5a/Q== X-Gm-Message-State: AOAM532k8oZvdh2LEGyMa4HrZ96nsoWocojyD6lgSRg58FHbdlUPRte3 0wz2TqBNq/AV/jYwvGsW1oZIHw== X-Google-Smtp-Source: ABdhPJx31676c/B6ncG85tlDKcNq1RCxcA4nkF6QxfOtJjpsziZdgdIdDRk0UGwr3coWLefIz549vg== X-Received: by 2002:a17:90a:5d8c:: with SMTP id t12mr3300114pji.156.1606199185442; Mon, 23 Nov 2020 22:26:25 -0800 (PST) Received: from localhost ([122.172.12.172]) by smtp.gmail.com with ESMTPSA id 138sm13083011pfy.88.2020.11.23.22.26.24 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Nov 2020 22:26:24 -0800 (PST) From: Viresh Kumar To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "Rafael J. Wysocki" , Viresh Kumar Cc: linux-kernel@vger.kernel.org, Quentin Perret , Lukasz Luba , linux-pm@vger.kernel.org Subject: [PATCH V4 1/3] sched/core: Move schedutil_cpu_util() to core.c Date: Tue, 24 Nov 2020 11:56:14 +0530 Message-Id: X-Mailer: git-send-email 2.25.0.rc1.19.g042ed3e048af In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org There is nothing schedutil specific in schedutil_cpu_util(), move it to core.c and define it only for CONFIG_SMP. Signed-off-by: Viresh Kumar Acked-by: Rafael J. Wysocki --- kernel/sched/core.c | 108 +++++++++++++++++++++++++++++++ kernel/sched/cpufreq_schedutil.c | 106 ------------------------------ kernel/sched/sched.h | 12 +--- 3 files changed, 109 insertions(+), 117 deletions(-) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index d2003a7d5ab5..b81265aec4a0 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5117,6 +5117,114 @@ struct task_struct *idle_task(int cpu) return cpu_rq(cpu)->idle; } +#ifdef CONFIG_SMP +/* + * This function computes an effective utilization for the given CPU, to be + * used for frequency selection given the linear relation: f = u * f_max. + * + * The scheduler tracks the following metrics: + * + * cpu_util_{cfs,rt,dl,irq}() + * cpu_bw_dl() + * + * Where the cfs,rt and dl util numbers are tracked with the same metric and + * synchronized windows and are thus directly comparable. + * + * The cfs,rt,dl utilization are the running times measured with rq->clock_task + * which excludes things like IRQ and steal-time. These latter are then accrued + * in the irq utilization. + * + * The DL bandwidth number otoh is not a measured metric but a value computed + * based on the task model parameters and gives the minimal utilization + * required to meet deadlines. + */ +unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, + unsigned long max, enum schedutil_type type, + struct task_struct *p) +{ + unsigned long dl_util, util, irq; + struct rq *rq = cpu_rq(cpu); + + if (!uclamp_is_used() && + type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) { + return max; + } + + /* + * Early check to see if IRQ/steal time saturates the CPU, can be + * because of inaccuracies in how we track these -- see + * update_irq_load_avg(). + */ + irq = cpu_util_irq(rq); + if (unlikely(irq >= max)) + return max; + + /* + * Because the time spend on RT/DL tasks is visible as 'lost' time to + * CFS tasks and we use the same metric to track the effective + * utilization (PELT windows are synchronized) we can directly add them + * to obtain the CPU's actual utilization. + * + * CFS and RT utilization can be boosted or capped, depending on + * utilization clamp constraints requested by currently RUNNABLE + * tasks. + * When there are no CFS RUNNABLE tasks, clamps are released and + * frequency will be gracefully reduced with the utilization decay. + */ + util = util_cfs + cpu_util_rt(rq); + if (type == FREQUENCY_UTIL) + util = uclamp_rq_util_with(rq, util, p); + + dl_util = cpu_util_dl(rq); + + /* + * For frequency selection we do not make cpu_util_dl() a permanent part + * of this sum because we want to use cpu_bw_dl() later on, but we need + * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such + * that we select f_max when there is no idle time. + * + * NOTE: numerical errors or stop class might cause us to not quite hit + * saturation when we should -- something for later. + */ + if (util + dl_util >= max) + return max; + + /* + * OTOH, for energy computation we need the estimated running time, so + * include util_dl and ignore dl_bw. + */ + if (type == ENERGY_UTIL) + util += dl_util; + + /* + * There is still idle time; further improve the number by using the + * irq metric. Because IRQ/steal time is hidden from the task clock we + * need to scale the task numbers: + * + * max - irq + * U' = irq + --------- * U + * max + */ + util = scale_irq_capacity(util, irq, max); + util += irq; + + /* + * Bandwidth required by DEADLINE must always be granted while, for + * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism + * to gracefully reduce the frequency when no tasks show up for longer + * periods of time. + * + * Ideally we would like to set bw_dl as min/guaranteed freq and util + + * bw_dl as requested freq. However, cpufreq is not yet ready for such + * an interface. So, we only do the latter for now. + */ + if (type == FREQUENCY_UTIL) + util += cpu_bw_dl(rq); + + return min(max, util); +} +#endif /* CONFIG_SMP */ + /** * find_process_by_pid - find a process with a matching PID value. * @pid: the pid in question. diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index e254745a82cb..2d44befb322b 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -169,112 +169,6 @@ static unsigned int get_next_freq(struct sugov_policy *sg_policy, return cpufreq_driver_resolve_freq(policy, freq); } -/* - * This function computes an effective utilization for the given CPU, to be - * used for frequency selection given the linear relation: f = u * f_max. - * - * The scheduler tracks the following metrics: - * - * cpu_util_{cfs,rt,dl,irq}() - * cpu_bw_dl() - * - * Where the cfs,rt and dl util numbers are tracked with the same metric and - * synchronized windows and are thus directly comparable. - * - * The cfs,rt,dl utilization are the running times measured with rq->clock_task - * which excludes things like IRQ and steal-time. These latter are then accrued - * in the irq utilization. - * - * The DL bandwidth number otoh is not a measured metric but a value computed - * based on the task model parameters and gives the minimal utilization - * required to meet deadlines. - */ -unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, - unsigned long max, enum schedutil_type type, - struct task_struct *p) -{ - unsigned long dl_util, util, irq; - struct rq *rq = cpu_rq(cpu); - - if (!uclamp_is_used() && - type == FREQUENCY_UTIL && rt_rq_is_runnable(&rq->rt)) { - return max; - } - - /* - * Early check to see if IRQ/steal time saturates the CPU, can be - * because of inaccuracies in how we track these -- see - * update_irq_load_avg(). - */ - irq = cpu_util_irq(rq); - if (unlikely(irq >= max)) - return max; - - /* - * Because the time spend on RT/DL tasks is visible as 'lost' time to - * CFS tasks and we use the same metric to track the effective - * utilization (PELT windows are synchronized) we can directly add them - * to obtain the CPU's actual utilization. - * - * CFS and RT utilization can be boosted or capped, depending on - * utilization clamp constraints requested by currently RUNNABLE - * tasks. - * When there are no CFS RUNNABLE tasks, clamps are released and - * frequency will be gracefully reduced with the utilization decay. - */ - util = util_cfs + cpu_util_rt(rq); - if (type == FREQUENCY_UTIL) - util = uclamp_rq_util_with(rq, util, p); - - dl_util = cpu_util_dl(rq); - - /* - * For frequency selection we do not make cpu_util_dl() a permanent part - * of this sum because we want to use cpu_bw_dl() later on, but we need - * to check if the CFS+RT+DL sum is saturated (ie. no idle time) such - * that we select f_max when there is no idle time. - * - * NOTE: numerical errors or stop class might cause us to not quite hit - * saturation when we should -- something for later. - */ - if (util + dl_util >= max) - return max; - - /* - * OTOH, for energy computation we need the estimated running time, so - * include util_dl and ignore dl_bw. - */ - if (type == ENERGY_UTIL) - util += dl_util; - - /* - * There is still idle time; further improve the number by using the - * irq metric. Because IRQ/steal time is hidden from the task clock we - * need to scale the task numbers: - * - * max - irq - * U' = irq + --------- * U - * max - */ - util = scale_irq_capacity(util, irq, max); - util += irq; - - /* - * Bandwidth required by DEADLINE must always be granted while, for - * FAIR and RT, we use blocked utilization of IDLE CPUs as a mechanism - * to gracefully reduce the frequency when no tasks show up for longer - * periods of time. - * - * Ideally we would like to set bw_dl as min/guaranteed freq and util + - * bw_dl as requested freq. However, cpufreq is not yet ready for such - * an interface. So, we only do the latter for now. - */ - if (type == FREQUENCY_UTIL) - util += cpu_bw_dl(rq); - - return min(max, util); -} - static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) { struct rq *rq = cpu_rq(sg_cpu->cpu); diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index df80bfcea92e..0db6bcf0881f 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2484,7 +2484,6 @@ static inline unsigned long capacity_orig_of(int cpu) { return cpu_rq(cpu)->cpu_capacity_orig; } -#endif /** * enum schedutil_type - CPU utilization type @@ -2501,8 +2500,6 @@ enum schedutil_type { ENERGY_UTIL, }; -#ifdef CONFIG_CPU_FREQ_GOV_SCHEDUTIL - unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, unsigned long max, enum schedutil_type type, struct task_struct *p); @@ -2533,14 +2530,7 @@ static inline unsigned long cpu_util_rt(struct rq *rq) { return READ_ONCE(rq->avg_rt.util_avg); } -#else /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */ -static inline unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, - unsigned long max, enum schedutil_type type, - struct task_struct *p) -{ - return 0; -} -#endif /* CONFIG_CPU_FREQ_GOV_SCHEDUTIL */ +#endif #ifdef CONFIG_HAVE_SCHED_AVG_IRQ static inline unsigned long cpu_util_irq(struct rq *rq) From patchwork Tue Nov 24 06:26:15 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 11927273 X-Patchwork-Delegate: daniel.lezcano@linaro.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BE709C63777 for ; Tue, 24 Nov 2020 06:26:52 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 5F8BB2068D for ; Tue, 24 Nov 2020 06:26:52 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="qpwXt+1Q" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728965AbgKXG0b (ORCPT ); Tue, 24 Nov 2020 01:26:31 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52786 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729700AbgKXG0a (ORCPT ); Tue, 24 Nov 2020 01:26:30 -0500 Received: from mail-pf1-x441.google.com (mail-pf1-x441.google.com [IPv6:2607:f8b0:4864:20::441]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A76D1C0613CF for ; Mon, 23 Nov 2020 22:26:29 -0800 (PST) Received: by mail-pf1-x441.google.com with SMTP id n137so7437376pfd.3 for ; Mon, 23 Nov 2020 22:26:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=qNtM0MaukWKLeykgjeC5xsgWS0X78S8oFPhzD7Rh5OM=; b=qpwXt+1QC2Ho8ePpan7K4yqGgIkQru3q8ErF9OyRl0suCRvMvfeUYF328TZ/vm5BCk fSmpv7eYRnbriUuEcqBUjn1WnbjO2P+6jUxSAuNnQZl6613pBpS1uH+Y5gjrwI0Wqy/n s5JtLRBe9AyyqYSyfjOXx4/ECnJxovAzq37weRR+wNo903yxXgnkc9QZ2PLaeG0R/4jJ dRM0bh+t31PJ5L6nAd95QDs1pQAtsx0b0Aa5umx5cPYQOlj13KCY80l5LqjRi1eZU/1D 6uOKCuN6PoPgnAIVQO9WKcLM3kNz1JxSPp/BQ5v8NiljQVP8xdXqqv1nBRchIU4s2SzB xYLQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qNtM0MaukWKLeykgjeC5xsgWS0X78S8oFPhzD7Rh5OM=; b=i2G+HpyCsyk+StbCYGvqJAQAKCZx1s6VHyxlyilHlrkgKV5NpHFLqS9MT+y+BGIWqE z0azSFrnUk7DJhOYgehK4QmAIOOZiV+XQ7L1YcxOBz3vfPKgeFNt5IQjrtoqHoK9FoCq +4lw0FXCCXCgBfsORIIAe8JGIIdfnGwtCBkVx8sbaJ8w6FF91DPj8AvzCjid6/BJIpzc SA/LJDitrIexfFk5si8VCF3dtA9KopYL5QGIE0X0sn2YvDfkAvv3c6yfaUL5TAHrANjR 9shXbUMktlg2TNi0GvcDWXl4qON8zkdGG5f1CuvG9kL/V2F1Zp7YR2e84qc/FcJnPK0G Xf0A== X-Gm-Message-State: AOAM530MUGuF01KzHbcZdPFgkcLKJQwgI8Rpl+wl3YL9PYT+dIsJQjyt BFcVGVSxc+go0Uu6uf9MnJojzg== X-Google-Smtp-Source: ABdhPJzxEMJBDlqj1uKonCyHqzmzsERUHPiWJRd6zF6oFF3FAP4i+7nkx5S0IncDxh5VEQQQjl9gqA== X-Received: by 2002:a17:90a:ead2:: with SMTP id ev18mr3088956pjb.91.1606199189204; Mon, 23 Nov 2020 22:26:29 -0800 (PST) Received: from localhost ([122.172.12.172]) by smtp.gmail.com with ESMTPSA id s26sm12632501pgv.93.2020.11.23.22.26.28 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Nov 2020 22:26:28 -0800 (PST) From: Viresh Kumar To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Juri Lelli , Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Daniel Bristot de Oliveira , "Rafael J. Wysocki" , Viresh Kumar Cc: linux-kernel@vger.kernel.org, Quentin Perret , Lukasz Luba , linux-pm@vger.kernel.org Subject: [PATCH V4 2/3] sched/core: Rename schedutil_cpu_util() and allow rest of the kernel to use it Date: Tue, 24 Nov 2020 11:56:15 +0530 Message-Id: <9a5442b916f9667e714dd84fe4e3fc26f8bcc887.1606198885.git.viresh.kumar@linaro.org> X-Mailer: git-send-email 2.25.0.rc1.19.g042ed3e048af In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org There is nothing schedutil specific in schedutil_cpu_util(), rename it to effective_cpu_util(). Also create and expose another wrapper sched_cpu_util() which can be used by other parts of the kernel, like thermal core (that will be done in a later commit). Signed-off-by: Viresh Kumar Acked-by: Rafael J. Wysocki --- include/linux/sched.h | 21 +++++++++++++++++++++ kernel/sched/core.c | 11 +++++++++-- kernel/sched/cpufreq_schedutil.c | 2 +- kernel/sched/fair.c | 6 +++--- kernel/sched/sched.h | 19 ++----------------- 5 files changed, 36 insertions(+), 23 deletions(-) diff --git a/include/linux/sched.h b/include/linux/sched.h index 063cd120b459..926b944dae5e 100644 --- a/include/linux/sched.h +++ b/include/linux/sched.h @@ -1926,6 +1926,27 @@ extern long sched_getaffinity(pid_t pid, struct cpumask *mask); #define TASK_SIZE_OF(tsk) TASK_SIZE #endif +#ifdef CONFIG_SMP +/** + * enum cpu_util_type - CPU utilization type + * @FREQUENCY_UTIL: Utilization used to select frequency + * @ENERGY_UTIL: Utilization used during energy calculation + * + * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time + * need to be aggregated differently depending on the usage made of them. This + * enum is used within sched_cpu_util() to differentiate the types of + * utilization expected by the callers, and adjust the aggregation accordingly. + */ +enum cpu_util_type { + FREQUENCY_UTIL, + ENERGY_UTIL, +}; + +/* Returns effective CPU utilization, as seen by the scheduler */ +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type, + unsigned long max); +#endif /* CONFIG_SMP */ + #ifdef CONFIG_RSEQ /* diff --git a/kernel/sched/core.c b/kernel/sched/core.c index b81265aec4a0..845c976ccd53 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5138,8 +5138,8 @@ struct task_struct *idle_task(int cpu) * based on the task model parameters and gives the minimal utilization * required to meet deadlines. */ -unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, - unsigned long max, enum schedutil_type type, +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, + unsigned long max, enum cpu_util_type type, struct task_struct *p) { unsigned long dl_util, util, irq; @@ -5223,6 +5223,13 @@ unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, return min(max, util); } + +unsigned long sched_cpu_util(int cpu, enum cpu_util_type type, + unsigned long max) +{ + return effective_cpu_util(cpu, cpu_util_cfs(cpu_rq(cpu)), max, type, + NULL); +} #endif /* CONFIG_SMP */ /** diff --git a/kernel/sched/cpufreq_schedutil.c b/kernel/sched/cpufreq_schedutil.c index 2d44befb322b..e71627a3792b 100644 --- a/kernel/sched/cpufreq_schedutil.c +++ b/kernel/sched/cpufreq_schedutil.c @@ -178,7 +178,7 @@ static unsigned long sugov_get_util(struct sugov_cpu *sg_cpu) sg_cpu->max = max; sg_cpu->bw_dl = cpu_bw_dl(rq); - return schedutil_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL); + return effective_cpu_util(sg_cpu->cpu, util, max, FREQUENCY_UTIL, NULL); } /** diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 290f9e38378c..0e1c8eb7ad53 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -6499,7 +6499,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) * is already enough to scale the EM reported power * consumption at the (eventually clamped) cpu_capacity. */ - sum_util += schedutil_cpu_util(cpu, util_cfs, cpu_cap, + sum_util += effective_cpu_util(cpu, util_cfs, cpu_cap, ENERGY_UTIL, NULL); /* @@ -6509,7 +6509,7 @@ compute_energy(struct task_struct *p, int dst_cpu, struct perf_domain *pd) * NOTE: in case RT tasks are running, by default the * FREQUENCY_UTIL's utilization can be max OPP. */ - cpu_util = schedutil_cpu_util(cpu, util_cfs, cpu_cap, + cpu_util = effective_cpu_util(cpu, util_cfs, cpu_cap, FREQUENCY_UTIL, tsk); max_util = max(max_util, cpu_util); } @@ -6607,7 +6607,7 @@ static int find_energy_efficient_cpu(struct task_struct *p, int prev_cpu) * IOW, placing the task there would make the CPU * overutilized. Take uclamp into account to see how * much capacity we can get out of the CPU; this is - * aligned with schedutil_cpu_util(). + * aligned with sched_cpu_util(). */ util = uclamp_rq_util_with(cpu_rq(cpu), util, p); if (!fits_capacity(util, cpu_cap)) diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 0db6bcf0881f..4fab3b930ace 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -2485,23 +2485,8 @@ static inline unsigned long capacity_orig_of(int cpu) return cpu_rq(cpu)->cpu_capacity_orig; } -/** - * enum schedutil_type - CPU utilization type - * @FREQUENCY_UTIL: Utilization used to select frequency - * @ENERGY_UTIL: Utilization used during energy calculation - * - * The utilization signals of all scheduling classes (CFS/RT/DL) and IRQ time - * need to be aggregated differently depending on the usage made of them. This - * enum is used within schedutil_freq_util() to differentiate the types of - * utilization expected by the callers, and adjust the aggregation accordingly. - */ -enum schedutil_type { - FREQUENCY_UTIL, - ENERGY_UTIL, -}; - -unsigned long schedutil_cpu_util(int cpu, unsigned long util_cfs, - unsigned long max, enum schedutil_type type, +unsigned long effective_cpu_util(int cpu, unsigned long util_cfs, + unsigned long max, enum cpu_util_type type, struct task_struct *p); static inline unsigned long cpu_bw_dl(struct rq *rq) From patchwork Tue Nov 24 06:26:16 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Viresh Kumar X-Patchwork-Id: 11927275 X-Patchwork-Delegate: daniel.lezcano@linaro.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-18.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C066C64E7A for ; Tue, 24 Nov 2020 06:26:53 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2020B2076C for ; Tue, 24 Nov 2020 06:26:53 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=linaro.org header.i=@linaro.org header.b="paWhjZUZ" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729717AbgKXG0e (ORCPT ); Tue, 24 Nov 2020 01:26:34 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:52798 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1728602AbgKXG0d (ORCPT ); Tue, 24 Nov 2020 01:26:33 -0500 Received: from mail-pg1-x544.google.com (mail-pg1-x544.google.com [IPv6:2607:f8b0:4864:20::544]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id DB2A2C0613D6 for ; Mon, 23 Nov 2020 22:26:32 -0800 (PST) Received: by mail-pg1-x544.google.com with SMTP id t37so16469755pga.7 for ; Mon, 23 Nov 2020 22:26:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=hRaCrzFnuBpNc0a8QYnZlNCHJIyccTR5VAFFQy4r+BA=; b=paWhjZUZnQ//0RZ/Caj8oKtcsdvN5eIZPQez+Xc/Po08oMl4Eab6HG2BWNifzqfZO/ FUxrQIFFxgssEtx6W25Q0tlx2Q1agRj1h39l5UXJGlr5L5aFhOb7BGRNYmLgTC5zL3xS H2VmrsPwIuTFowdWcyE0MfZL0EhtugL7CIqtA0EPkTSiW/UYKcNnQGubSepgFahKxZsP mfX4IOCU15kOkActBH2h/gNn0fRljJXBH9arVTconYLtogCQ/T0WrWdj0NRFZ4RBtlbY 4KK90GU3Zod3LDtsRVzMI6NI5c8OehqH1kotAiu6u4l8uQTSWUFJFrv3FQBwYC0QGpUR L0eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hRaCrzFnuBpNc0a8QYnZlNCHJIyccTR5VAFFQy4r+BA=; b=Z012IC74hpUL1qr/Jg/3WQWEHCpnQPHGFL6Nr9GIJo+ptTw8jEoJrjS2IUdeG1a3qS snBrLYtZciFS5iCf4fo4X5D4WPij8qEJOQtL/hFwir93l1KYRdLaZ8cIfSeUd/KtWJ2F GCEqZGj87iwsx+1Pz21RyrJJd/RnizkW2Y2UxfPU3al0f/UZHcOO1+OqNwLln5i3HpXK 1VqDdW1XWk24gXZ0Sj+o6cMaJiLSxaYJZUfw+pOHse4SjQE+DJ4WXdrEQ328JeFtTs58 +ovpbDz1MmxNYwdVo2cjmq6Rvp7SdAZiBLJ4xJD8criuOPt6RKBWrUkgxcXQk21dpJeJ 8hYg== X-Gm-Message-State: AOAM530Wwz13LcBVJjc22kUI2eyqjuZd8maE0c1iXTIaLHTRdEcyAH55 79MsNK3fPbl5WlDum0rVLFIKxQ== X-Google-Smtp-Source: ABdhPJx3qH9sIiGjY8FHdbq9ek8+gv2OTxIu2Bg16+qa/PvTSsFYKDXilnSVHpo9V4PRw5mgb30P4w== X-Received: by 2002:a17:90a:4dc3:: with SMTP id r3mr3102982pjl.155.1606199192433; Mon, 23 Nov 2020 22:26:32 -0800 (PST) Received: from localhost ([122.172.12.172]) by smtp.gmail.com with ESMTPSA id s189sm13501865pfb.60.2020.11.23.22.26.31 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 23 Nov 2020 22:26:31 -0800 (PST) From: Viresh Kumar To: Ingo Molnar , Peter Zijlstra , Vincent Guittot , Amit Daniel Kachhap , Daniel Lezcano , Viresh Kumar , Javi Merino , Zhang Rui , Amit Kucheria Cc: linux-kernel@vger.kernel.org, Quentin Perret , Lukasz Luba , linux-pm@vger.kernel.org Subject: [PATCH V4 3/3] thermal: cpufreq_cooling: Reuse sched_cpu_util() for SMP platforms Date: Tue, 24 Nov 2020 11:56:16 +0530 Message-Id: X-Mailer: git-send-email 2.25.0.rc1.19.g042ed3e048af In-Reply-To: References: MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-pm@vger.kernel.org Several parts of the kernel are already using the effective CPU utilization (as seen by the scheduler) to get the current load on the CPU, do the same here instead of depending on the idle time of the CPU, which isn't that accurate comparatively. This is also the right thing to do as it makes the cpufreq governor (schedutil) align better with the cpufreq_cooling driver, as the power requested by cpufreq_cooling governor will exactly match the next frequency requested by the schedutil governor since they are both using the same metric to calculate load. This was tested on ARM Hikey6220 platform with hackbench, sysbench and schbench. None of them showed any regression or significant improvements. Schbench is the most important ones out of these as it creates the scenario where the utilization numbers provide a better estimate of the future. Scenario 1: The CPUs were mostly idle in the previous polling window of the IPA governor as the tasks were sleeping and here are the details from traces (load is in %): Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=203 load={{0x35,0x1,0x0,0x31,0x0,0x0,0x64,0x0}} dynamic_power=1339 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=600 load={{0x60,0x46,0x45,0x45,0x48,0x3b,0x61,0x44}} dynamic_power=3960 Here, the "Old" line gives the load and requested_power (dynamic_power here) numbers calculated using the idle time based implementation, while "New" is based on the CPU utilization from scheduler. As can be clearly seen, the load and requested_power numbers are simply incorrect in the idle time based approach and the numbers collected from CPU's utilization are much closer to the reality. Scenario 2: The CPUs were busy in the previous polling window of the IPA governor: Old: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=800 load={{0x64,0x64,0x64,0x64,0x64,0x64,0x64,0x64}} dynamic_power=5280 New: thermal_power_cpu_get_power: cpus=00000000,000000ff freq=1200000 total_load=708 load={{0x4d,0x5c,0x5c,0x5b,0x5c,0x5c,0x51,0x5b}} dynamic_power=4672 As can be seen, the idle time based load is 100% for all the CPUs as it took only the last window into account, but in reality the CPUs aren't that loaded as shown by the utilization numbers. Reviewed-by: Lukasz Luba Signed-off-by: Viresh Kumar --- drivers/thermal/cpufreq_cooling.c | 68 ++++++++++++++++++++++++------- 1 file changed, 54 insertions(+), 14 deletions(-) diff --git a/drivers/thermal/cpufreq_cooling.c b/drivers/thermal/cpufreq_cooling.c index cc2959f22f01..5aff2ac4b77f 100644 --- a/drivers/thermal/cpufreq_cooling.c +++ b/drivers/thermal/cpufreq_cooling.c @@ -76,7 +76,9 @@ struct cpufreq_cooling_device { struct em_perf_domain *em; struct cpufreq_policy *policy; struct list_head node; +#ifndef CONFIG_SMP struct time_in_idle *idle_time; +#endif struct freq_qos_request qos_req; }; @@ -132,14 +134,35 @@ static u32 cpu_power_to_freq(struct cpufreq_cooling_device *cpufreq_cdev, } /** - * get_load() - get load for a cpu since last updated - * @cpufreq_cdev: &struct cpufreq_cooling_device for this cpu - * @cpu: cpu number - * @cpu_idx: index of the cpu in time_in_idle* + * get_load() - get load for a cpu + * @cpufreq_cdev: struct cpufreq_cooling_device for the cpu + * @cpu: cpu number + * @cpu_idx: index of the cpu in time_in_idle array * * Return: The average load of cpu @cpu in percentage since this * function was last called. */ +#ifdef CONFIG_SMP +static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu, + int cpu_idx) +{ + unsigned long max = arch_scale_cpu_capacity(cpu); + unsigned long util; + + util = sched_cpu_util(cpu, ENERGY_UTIL, max); + return (util * 100) / max; +} + +static inline int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev) +{ + return 0; +} + +static inline void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev) +{ +} + +#else /* !CONFIG_SMP */ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu, int cpu_idx) { @@ -162,6 +185,26 @@ static u32 get_load(struct cpufreq_cooling_device *cpufreq_cdev, int cpu, return load; } +static int allocate_idle_time(struct cpufreq_cooling_device *cpufreq_cdev) +{ + unsigned int num_cpus = cpumask_weight(cpufreq_cdev->policy->related_cpus); + + cpufreq_cdev->idle_time = kcalloc(num_cpus, + sizeof(*cpufreq_cdev->idle_time), + GFP_KERNEL); + if (!cpufreq_cdev->idle_time) + return -ENOMEM; + + return 0; +} + +static void free_idle_time(struct cpufreq_cooling_device *cpufreq_cdev) +{ + kfree(cpufreq_cdev->idle_time); + cpufreq_cdev->idle_time = NULL; +} +#endif /* CONFIG_SMP */ + /** * get_dynamic_power() - calculate the dynamic power * @cpufreq_cdev: &cpufreq_cooling_device for this cdev @@ -487,7 +530,7 @@ __cpufreq_cooling_register(struct device_node *np, struct thermal_cooling_device *cdev; struct cpufreq_cooling_device *cpufreq_cdev; char dev_name[THERMAL_NAME_LENGTH]; - unsigned int i, num_cpus; + unsigned int i; struct device *dev; int ret; struct thermal_cooling_device_ops *cooling_ops; @@ -498,7 +541,6 @@ __cpufreq_cooling_register(struct device_node *np, return ERR_PTR(-ENODEV); } - if (IS_ERR_OR_NULL(policy)) { pr_err("%s: cpufreq policy isn't valid: %p\n", __func__, policy); return ERR_PTR(-EINVAL); @@ -516,12 +558,10 @@ __cpufreq_cooling_register(struct device_node *np, return ERR_PTR(-ENOMEM); cpufreq_cdev->policy = policy; - num_cpus = cpumask_weight(policy->related_cpus); - cpufreq_cdev->idle_time = kcalloc(num_cpus, - sizeof(*cpufreq_cdev->idle_time), - GFP_KERNEL); - if (!cpufreq_cdev->idle_time) { - cdev = ERR_PTR(-ENOMEM); + + ret = allocate_idle_time(cpufreq_cdev); + if (ret) { + cdev = ERR_PTR(ret); goto free_cdev; } @@ -581,7 +621,7 @@ __cpufreq_cooling_register(struct device_node *np, remove_ida: ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id); free_idle_time: - kfree(cpufreq_cdev->idle_time); + free_idle_time(cpufreq_cdev); free_cdev: kfree(cpufreq_cdev); return cdev; @@ -674,7 +714,7 @@ void cpufreq_cooling_unregister(struct thermal_cooling_device *cdev) thermal_cooling_device_unregister(cdev); freq_qos_remove_request(&cpufreq_cdev->qos_req); ida_simple_remove(&cpufreq_ida, cpufreq_cdev->id); - kfree(cpufreq_cdev->idle_time); + free_idle_time(cpufreq_cdev); kfree(cpufreq_cdev); } EXPORT_SYMBOL_GPL(cpufreq_cooling_unregister);