From patchwork Wed Dec 12 13:31:29 2012 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Vincent Guittot X-Patchwork-Id: 1866231 Return-Path: X-Original-To: patchwork-linux-arm@patchwork.kernel.org Delivered-To: patchwork-process-083081@patchwork2.kernel.org Received: from merlin.infradead.org (merlin.infradead.org [205.233.59.134]) by patchwork2.kernel.org (Postfix) with ESMTP id C272EDF2EE for ; Wed, 12 Dec 2012 13:36:13 +0000 (UTC) Received: from localhost ([::1] helo=merlin.infradead.org) by merlin.infradead.org with esmtp (Exim 4.76 #1 (Red Hat Linux)) id 1TimRF-0007h8-2B; Wed, 12 Dec 2012 13:33:29 +0000 Received: from mail-wi0-f173.google.com ([209.85.212.173]) by merlin.infradead.org with esmtps (Exim 4.76 #1 (Red Hat Linux)) id 1TimQK-0007TI-IA for linux-arm-kernel@lists.infradead.org; Wed, 12 Dec 2012 13:32:41 +0000 Received: by mail-wi0-f173.google.com with SMTP id hn17so2652163wib.0 for ; Wed, 12 Dec 2012 05:32:31 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=from:to:cc:subject:date:message-id:x-mailer:in-reply-to:references :x-gm-message-state; bh=MQOXrI2uodq1bI9hEp6EWQsfFzVAVM07zyn3DJgTXT4=; b=N7leoxYusl7gqD6qLMJV4kICY/DOoVQ1jvub3uoQjawIC6HnCFbD5glMdHZiTCWKDL mNoJnFSZ4qG7KRh8cmN3K1l8se5wztMMDbBYz4z+u8hfM9EDx/Hb3+dHRWMY/Gh6XmxH KK6XCBaacDDyQaJmzKnciXIprUPm+psexBMAMq2L418U0K9JgDGJ+ADCLXep5ayrPFLU 8bNh1bI/waXPfS9Duq1orb0KoQPpJmpQq7bfAuJqexfQXz6tBWZLCM69PEjof6RTOEBl VHK4eyoIuboOmq1GzYRGZIMcp5vyO249gP53XyvSHCTaasuS129+EGJIaDdnrg92I85s Wokg== Received: by 10.194.110.231 with SMTP id id7mr2001687wjb.6.1355319151463; Wed, 12 Dec 2012 05:32:31 -0800 (PST) Received: from localhost.localdomain (LPuteaux-156-14-44-212.w82-127.abo.wanadoo.fr. [82.127.83.212]) by mx.google.com with ESMTPS id t17sm21269650wiv.6.2012.12.12.05.32.29 (version=TLSv1/SSLv3 cipher=OTHER); Wed, 12 Dec 2012 05:32:30 -0800 (PST) From: Vincent Guittot To: linux-kernel@vger.kernel.org, linux-arm-kernel@lists.infradead.org, linaro-dev@lists.linaro.org, peterz@infradead.org, mingo@kernel.org, linux@arm.linux.org.uk, pjt@google.com, santosh.shilimkar@ti.com, Morten.Rasmussen@arm.com, chander.kashyap@linaro.org, cmetcalf@tilera.com, tony.luck@intel.com Subject: [RFC PATCH v2 3/6] sched: pack small tasks Date: Wed, 12 Dec 2012 14:31:29 +0100 Message-Id: <1355319092-30980-4-git-send-email-vincent.guittot@linaro.org> X-Mailer: git-send-email 1.7.9.5 In-Reply-To: <1355319092-30980-1-git-send-email-vincent.guittot@linaro.org> References: <1355319092-30980-1-git-send-email-vincent.guittot@linaro.org> X-Gm-Message-State: ALoCoQnKAnHJIu7lyfGBFazliWzoo6oYqDlFxEsj2+RMVQ+rcxSfo0DoeLOF4pRbgGEpRsu410lD X-CRM114-Version: 20100106-BlameMichelson ( TRE 0.8.0 (BSD) ) MR-646709E3 X-CRM114-CacheID: sfid-20121212_083233_022189_F3445039 X-CRM114-Status: GOOD ( 21.85 ) X-Spam-Score: 0.4 (/) X-Spam-Report: SpamAssassin version 3.3.2 on merlin.infradead.org summary: Content analysis details: (0.4 points) pts rule name description ---- ---------------------- -------------------------------------------------- -0.7 RCVD_IN_DNSWL_LOW RBL: Sender listed at http://www.dnswl.org/, low trust [209.85.212.173 listed in list.dnswl.org] 3.0 KHOP_BIG_TO_CC Sent to 10+ recipients instaed of Bcc or a list -1.9 BAYES_00 BODY: Bayes spam probability is 0 to 1% [score: 0.0000] Cc: len.brown@intel.com, alex.shi@intel.com, Vincent Guittot , viresh.kumar@linaro.org, amit.kucheria@linaro.org, preeti@linux.vnet.ibm.com, tglx@linutronix.de, paulmck@linux.vnet.ibm.com, arjan@linux.intel.com X-BeenThere: linux-arm-kernel@lists.infradead.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: linux-arm-kernel-bounces@lists.infradead.org Errors-To: linux-arm-kernel-bounces+patchwork-linux-arm=patchwork.kernel.org@lists.infradead.org During the creation of sched_domain, we define a pack buddy CPU for each CPU when one is available. We want to pack at all levels where a group of CPU can be power gated independently from others. On a system that can't power gate a group of CPUs independently, the flag is set at all sched_domain level and the buddy is set to -1. This is the default behavior. On a dual clusters / dual cores system which can power gate each core and cluster independently, the buddy configuration will be : | Cluster 0 | Cluster 1 | | CPU0 | CPU1 | CPU2 | CPU3 | ----------------------------------- buddy | CPU0 | CPU0 | CPU0 | CPU2 | Small tasks tend to slip out of the periodic load balance so the best place to choose to migrate them is during their wake up. The decision is in O(1) as we only check again one buddy CPU Signed-off-by: Vincent Guittot --- kernel/sched/core.c | 1 + kernel/sched/fair.c | 110 ++++++++++++++++++++++++++++++++++++++++++++++++++ kernel/sched/sched.h | 5 +++ 3 files changed, 116 insertions(+) diff --git a/kernel/sched/core.c b/kernel/sched/core.c index 4f36e9d..3436aad 100644 --- a/kernel/sched/core.c +++ b/kernel/sched/core.c @@ -5693,6 +5693,7 @@ cpu_attach_domain(struct sched_domain *sd, struct root_domain *rd, int cpu) rcu_assign_pointer(rq->sd, sd); destroy_sched_domains(tmp, cpu); + update_packing_domain(cpu); update_domain_cache(cpu); } diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index 9916d41..fc93d96 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -163,6 +163,73 @@ void sched_init_granularity(void) update_sysctl(); } + +#ifdef CONFIG_SMP +/* + * Save the id of the optimal CPU that should be used to pack small tasks + * The value -1 is used when no buddy has been found + */ +DEFINE_PER_CPU(int, sd_pack_buddy); + +/* Look for the best buddy CPU that can be used to pack small tasks + * We make the assumption that it doesn't wort to pack on CPU that share the + * same powerline. We looks for the 1st sched_domain without the + * SD_SHARE_POWERDOMAIN flag. Then We look for the sched_group witht the lowest + * power per core based on the assumption that their power efficiency is + * better */ +void update_packing_domain(int cpu) +{ + struct sched_domain *sd; + int id = -1; + + sd = highest_flag_domain(cpu, SD_SHARE_POWERDOMAIN & SD_LOAD_BALANCE); + if (!sd) + sd = rcu_dereference_check_sched_domain(cpu_rq(cpu)->sd); + else + sd = sd->parent; + + while (sd && (sd->flags && SD_LOAD_BALANCE)) { + struct sched_group *sg = sd->groups; + struct sched_group *pack = sg; + struct sched_group *tmp; + + /* + * The sched_domain of a CPU points on the local sched_group + * and the 1st CPU of this local group is a good candidate + */ + id = cpumask_first(sched_group_cpus(pack)); + + /* loop the sched groups to find the best one */ + for (tmp = sg->next; tmp != sg; tmp = tmp->next) { + if (tmp->sgp->power * pack->group_weight > + pack->sgp->power * tmp->group_weight) + continue; + + if ((tmp->sgp->power * pack->group_weight == + pack->sgp->power * tmp->group_weight) + && (cpumask_first(sched_group_cpus(tmp)) >= id)) + continue; + + /* we have found a better group */ + pack = tmp; + + /* Take the 1st CPU of the new group */ + id = cpumask_first(sched_group_cpus(pack)); + } + + /* Look for another CPU than itself */ + if (id != cpu) + break; + + sd = sd->parent; + } + + pr_debug("CPU%d packing on CPU%d\n", cpu, id); + per_cpu(sd_pack_buddy, cpu) = id; +} + +#endif /* CONFIG_SMP */ + #if BITS_PER_LONG == 32 # define WMULT_CONST (~0UL) #else @@ -5083,6 +5150,46 @@ static bool numa_allow_migration(struct task_struct *p, int prev_cpu, int new_cp return true; } +static bool is_buddy_busy(int cpu) +{ + struct rq *rq = cpu_rq(cpu); + + /* + * A busy buddy is a CPU with a high load or a small load with a lot of + * running tasks. + */ + return ((rq->avg.runnable_avg_sum << rq->nr_running) > + rq->avg.runnable_avg_period); +} + +static bool is_light_task(struct task_struct *p) +{ + /* A light task runs less than 25% in average */ + return ((p->se.avg.runnable_avg_sum << 1) < + p->se.avg.runnable_avg_period); +} + +static int check_pack_buddy(int cpu, struct task_struct *p) +{ + int buddy = per_cpu(sd_pack_buddy, cpu); + + /* No pack buddy for this CPU */ + if (buddy == -1) + return false; + + /* buddy is not an allowed CPU */ + if (!cpumask_test_cpu(buddy, tsk_cpus_allowed(p))) + return false; + + /* + * If the task is a small one and the buddy is not overloaded, + * we use buddy cpu + */ + if (!is_light_task(p) || is_buddy_busy(buddy)) + return false; + + return true; +} /* * sched_balance_self: balance the current task (running on cpu) in domains @@ -5120,6 +5227,9 @@ select_task_rq_fair(struct task_struct *p, int sd_flag, int wake_flags) return p->ideal_cpu; #endif + if (check_pack_buddy(cpu, p)) + return per_cpu(sd_pack_buddy, cpu); + if (sd_flag & SD_BALANCE_WAKE) { if (cpumask_test_cpu(cpu, tsk_cpus_allowed(p))) want_affine = 1; diff --git a/kernel/sched/sched.h b/kernel/sched/sched.h index 92ba891..3802fc4 100644 --- a/kernel/sched/sched.h +++ b/kernel/sched/sched.h @@ -892,6 +892,7 @@ extern const struct sched_class idle_sched_class; extern void trigger_load_balance(struct rq *rq, int cpu); extern void idle_balance(int this_cpu, struct rq *this_rq); +extern void update_packing_domain(int cpu); #else /* CONFIG_SMP */ @@ -899,6 +900,10 @@ static inline void idle_balance(int cpu, struct rq *rq) { } +static inline void update_packing_domain(int cpu) +{ +} + #endif extern void sysrq_sched_debug_show(void);