From patchwork Mon May 6 06:56:44 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 10930537 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 45EDF1708 for ; Mon, 6 May 2019 06:58:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 32F1828397 for ; Mon, 6 May 2019 06:58:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 26FB228590; Mon, 6 May 2019 06:58:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI, RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 763882861E for ; Mon, 6 May 2019 06:58:45 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hNXZA-0003Hl-BR; Mon, 06 May 2019 06:57:36 +0000 Received: from us1-rack-dfw2.inumbo.com ([104.130.134.6]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1hNXYo-0002LP-4T for xen-devel@lists.xenproject.org; Mon, 06 May 2019 06:57:14 +0000 X-Inumbo-ID: 28128da0-6fcc-11e9-843c-bc764e045a96 Received: from mx1.suse.de (unknown [195.135.220.15]) by us1-rack-dfw2.inumbo.com (Halon) with ESMTPS id 28128da0-6fcc-11e9-843c-bc764e045a96; Mon, 06 May 2019 06:57:05 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id CF998AE3F; Mon, 6 May 2019 06:57:00 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 6 May 2019 08:56:44 +0200 Message-Id: <20190506065644.7415-46-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190506065644.7415-1-jgross@suse.com> References: <20190506065644.7415-1-jgross@suse.com> Subject: [Xen-devel] [PATCH RFC V2 45/45] xen/sched: add scheduling granularity enum X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , =?utf-8?q?Roger_Pau_Monn=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP Add a scheduling granularity enum ("thread", "core", "socket") for specification of the scheduling granularity. Initially it is set to "thread", this can be modified by the new boot parameter (x86 only) "sched_granularity". According to the selected granularity sched_granularity is set after all cpus are online. The sched items of the idle vcpus and the sched resources of the physical cpus need to be combined in case sched_granularity > 1, this happens before the init_pdata hook of the active scheduler is being called. A test is added for all sched resources holding the same number of cpus. For now panic if this is not the case. Signed-off-by: Juergen Gross --- RFC V2: - fixed freeing of sched_res when merging cpus - rename parameter to "sched-gran" (Jan Beulich) - rename parameter option from "thread" to "cpu" (Jan Beulich) --- xen/arch/x86/setup.c | 2 + xen/common/schedule.c | 155 +++++++++++++++++++++++++++++++++++++++++++-- xen/include/xen/sched-if.h | 4 +- xen/include/xen/sched.h | 1 + 4 files changed, 153 insertions(+), 9 deletions(-) diff --git a/xen/arch/x86/setup.c b/xen/arch/x86/setup.c index 3440794275..83854eeef8 100644 --- a/xen/arch/x86/setup.c +++ b/xen/arch/x86/setup.c @@ -1701,6 +1701,8 @@ void __init noreturn __start_xen(unsigned long mbi_p) printk(XENLOG_INFO "Parked %u CPUs\n", num_parked); smp_cpus_done(); + scheduler_smp_init(); + do_initcalls(); if ( opt_watchdog ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 4336f2bdf8..3e68259411 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -55,9 +55,32 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings); int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US; integer_param("sched_ratelimit_us", sched_ratelimit_us); +static enum { + SCHED_GRAN_cpu, + SCHED_GRAN_core, + SCHED_GRAN_socket +} opt_sched_granularity = SCHED_GRAN_cpu; + +#ifdef CONFIG_X86 +static int __init sched_select_granularity(const char *str) +{ + if (strcmp("cpu", str) == 0) + opt_sched_granularity = SCHED_GRAN_cpu; + else if (strcmp("core", str) == 0) + opt_sched_granularity = SCHED_GRAN_core; + else if (strcmp("socket", str) == 0) + opt_sched_granularity = SCHED_GRAN_socket; + else + return -EINVAL; + + return 0; +} +custom_param("sched-gran", sched_select_granularity); +#endif + /* Number of vcpus per struct sched_item. */ static unsigned int sched_granularity = 1; -const cpumask_t *sched_res_mask = &cpumask_all; +cpumask_var_t sched_res_mask; /* Various timer handlers. */ static void s_timer_fn(void *unused); @@ -323,6 +346,8 @@ static void sched_free_item(struct sched_item *item, struct vcpu *v) if ( item->vcpu == v ) item->vcpu = v->next_in_list; + item->runstate_cnt[v->runstate.state]--; + if ( !cnt ) sched_free_item_mem(item); } @@ -2113,8 +2138,14 @@ static int cpu_schedule_up(unsigned int cpu) sd = xzalloc(struct sched_resource); if ( sd == NULL ) return -ENOMEM; + if ( !zalloc_cpumask_var(&sd->cpus) ) + { + xfree(sd); + return -ENOMEM; + } + sd->processor = cpu; - sd->cpus = cpumask_of(cpu); + cpumask_copy(sd->cpus, cpumask_of(cpu)); per_cpu(sched_res, cpu) = sd; per_cpu(scheduler, cpu) = &ops; @@ -2170,30 +2201,92 @@ static int cpu_schedule_up(unsigned int cpu) return 0; } +static void sched_free_sched_res(struct sched_resource *sd) +{ + kill_timer(&sd->s_timer); + free_cpumask_var(sd->cpus); + + xfree(sd); +} + static void cpu_schedule_down(unsigned int cpu) { struct sched_resource *sd = per_cpu(sched_res, cpu); struct scheduler *sched = per_cpu(scheduler, cpu); + cpumask_clear_cpu(cpu, sd->cpus); + per_cpu(sched_res, cpu) = NULL; + + if ( cpumask_weight(sd->cpus) ) + return; + sched_free_pdata(sched, sd->sched_priv, cpu); sched_free_vdata(sched, idle_vcpu[cpu]->sched_item->priv); idle_vcpu[cpu]->sched_item->priv = NULL; sd->sched_priv = NULL; + cpumask_clear_cpu(cpu, sched_res_mask); - kill_timer(&sd->s_timer); - - xfree(per_cpu(sched_res, cpu)); - per_cpu(sched_res, cpu) = NULL; + sched_free_sched_res(sd); } void scheduler_percpu_init(unsigned int cpu) { struct scheduler *sched = per_cpu(scheduler, cpu); struct sched_resource *sd = per_cpu(sched_res, cpu); + const cpumask_t *mask; + unsigned int master_cpu; + spinlock_t *lock; + struct sched_item *old_item, *master_item; + + if ( system_state == SYS_STATE_resume ) + return; + + switch ( opt_sched_granularity ) + { + case SCHED_GRAN_cpu: + mask = cpumask_of(cpu); + break; + case SCHED_GRAN_core: + mask = per_cpu(cpu_sibling_mask, cpu); + break; + case SCHED_GRAN_socket: + mask = per_cpu(cpu_core_mask, cpu); + break; + default: + ASSERT_UNREACHABLE(); + return; + } - if ( system_state != SYS_STATE_resume ) + if ( cpu == 0 || cpumask_weight(mask) == 1 ) + { + cpumask_set_cpu(cpu, sched_res_mask); sched_init_pdata(sched, sd->sched_priv, cpu); + return; + } + + master_cpu = cpumask_first(mask); + master_item = idle_vcpu[master_cpu]->sched_item; + lock = pcpu_schedule_lock(master_cpu); + + /* Merge idle_vcpu item and sched_resource into master cpu. */ + old_item = idle_vcpu[cpu]->sched_item; + idle_vcpu[cpu]->sched_item = master_item; + per_cpu(sched_res, cpu) = per_cpu(sched_res, master_cpu); + per_cpu(sched_res_idx, cpu) = cpumask_weight(per_cpu(sched_res, cpu)->cpus); + cpumask_set_cpu(cpu, per_cpu(sched_res, cpu)->cpus); + master_item->runstate_cnt[RUNSTATE_running] += + old_item->runstate_cnt[RUNSTATE_running]; + master_item->runstate_cnt[RUNSTATE_runnable] += + old_item->runstate_cnt[RUNSTATE_runnable]; + + pcpu_schedule_unlock(lock, master_cpu); + + sched_free_pdata(sched, sd->sched_priv, cpu); + sched_free_vdata(sched, old_item->priv); + + sched_free_sched_res(sd); + sched_free_item_mem(old_item); } static int cpu_schedule_callback( @@ -2273,6 +2366,51 @@ static struct notifier_block cpu_schedule_nfb = { .notifier_call = cpu_schedule_callback }; +static unsigned int __init sched_check_granularity(void) +{ + unsigned int cpu; + unsigned int siblings, gran = 0; + + for_each_online_cpu( cpu ) + { + switch ( opt_sched_granularity ) + { + case SCHED_GRAN_cpu: + /* If granularity is "thread" we are fine already. */ + return 1; + case SCHED_GRAN_core: + siblings = cpumask_weight(per_cpu(cpu_sibling_mask, cpu)); + break; + case SCHED_GRAN_socket: + siblings = cpumask_weight(per_cpu(cpu_core_mask, cpu)); + break; + default: + ASSERT_UNREACHABLE(); + return 0; + } + + if ( gran == 0 ) + gran = siblings; + else if ( gran != siblings ) + return 0; + } + + return gran; +} + +/* Setup data for selected scheduler granularity. */ +void __init scheduler_smp_init(void) +{ + unsigned int gran; + + gran = sched_check_granularity(); + if ( gran == 0 ) + panic("Illegal cpu configuration for scheduling granularity!\n" + "Please use thread scheduling.\n"); + + sched_granularity = gran; +} + /* Initialise the data structures. */ void __init scheduler_init(void) { @@ -2304,6 +2442,9 @@ void __init scheduler_init(void) printk("Using '%s' (%s)\n", ops.name, ops.opt_name); } + if ( !zalloc_cpumask_var(&sched_res_mask) ) + BUG(); + if ( cpu_schedule_up(0) ) BUG(); register_cpu_notifier(&cpu_schedule_nfb); diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index f16d81ab4a..86525da77b 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -23,7 +23,7 @@ extern cpumask_t cpupool_free_cpus; extern int sched_ratelimit_us; /* Scheduling resource mask. */ -extern const cpumask_t *sched_res_mask; +extern cpumask_var_t sched_res_mask; /* * In order to allow a scheduler to remap the lock->cpu mapping, @@ -45,7 +45,7 @@ struct sched_resource { struct timer s_timer; /* scheduling timer */ atomic_t urgent_count; /* how many urgent vcpus */ unsigned processor; - const cpumask_t *cpus; /* cpus covered by this struct */ + cpumask_var_t cpus; /* cpus covered by this struct */ }; #define curr_on_cpu(c) (per_cpu(sched_res, c)->curr) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 5629602de5..0c37c2b55e 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -681,6 +681,7 @@ void noreturn asm_domain_crash_synchronous(unsigned long addr); void scheduler_init(void); void scheduler_percpu_init(unsigned int cpu); +void scheduler_smp_init(void); int sched_init_vcpu(struct vcpu *v); void sched_destroy_vcpu(struct vcpu *v); int sched_init_domain(struct domain *d, int poolid);