From patchwork Mon Sep 30 05:21:17 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166015 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 43FD113B1 for ; Mon, 30 Sep 2019 05:23:06 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1EC4520815 for ; Mon, 30 Sep 2019 05:23:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1EC4520815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo84-0001vu-Sb; Mon, 30 Sep 2019 05:21:48 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo83-0001vV-Gj for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:47 +0000 X-Inumbo-ID: 2e7db178-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 2e7db178-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 3063CAF13; Mon, 30 Sep 2019 05:21:39 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:17 +0200 Message-Id: <20190930052135.11257-2-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 01/19] xen/sched: add code to sync scheduling of all vcpus of a sched unit X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , Volodymyr Babchuk , =?utf-8?q?Roger_Pau_Monn?= =?utf-8?q?=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" When switching sched units synchronize all vcpus of the new unit to be scheduled at the same time. A variable sched_granularity is added which holds the number of vcpus per schedule unit. As tasklets require to schedule the idle unit it is required to set the tasklet_work_scheduled parameter of do_schedule() to true if any cpu covered by the current schedule() call has any pending tasklet work. For joining other vcpus of the schedule unit we need to add a new softirq SCHED_SLAVE_SOFTIRQ in order to have a way to initiate a context switch without calling the generic schedule() function selecting the vcpu to switch to, as we already know which vcpu we want to run. This has the other advantage not to loose any other concurrent SCHEDULE_SOFTIRQ events. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli Acked-by: Jan Beulich --- RFC V2: - move syncing after context_switch() to schedule.c V2: - don't run tasklets directly from sched_wait_rendezvous_in() V3: - adapt array size in sched_move_domain() (Jan Beulich) - int -> unsigned int (Jan Beulich) V4: - renamed sd to sr in several places (Jan Beulich) - swap stop_timer() and NOW() calls (Jan Beulich) - context_switch() on ARM returns - handle that (Jan Beulich) --- xen/arch/arm/domain.c | 2 +- xen/arch/x86/domain.c | 3 +- xen/common/schedule.c | 353 +++++++++++++++++++++++++++++++++++---------- xen/common/softirq.c | 6 +- xen/include/xen/sched-if.h | 1 + xen/include/xen/sched.h | 16 +- xen/include/xen/softirq.h | 1 + 7 files changed, 294 insertions(+), 88 deletions(-) diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c index f0ee5a2140..460e968e97 100644 --- a/xen/arch/arm/domain.c +++ b/xen/arch/arm/domain.c @@ -318,7 +318,7 @@ static void schedule_tail(struct vcpu *prev) local_irq_enable(); - context_saved(prev); + sched_context_switched(prev, current); update_runstate_area(current); diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index c7fa224c89..27f99d3bcc 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -1784,7 +1784,6 @@ static void __context_switch(void) per_cpu(curr_vcpu, cpu) = n; } - void context_switch(struct vcpu *prev, struct vcpu *next) { unsigned int cpu = smp_processor_id(); @@ -1860,7 +1859,7 @@ void context_switch(struct vcpu *prev, struct vcpu *next) } } - context_saved(prev); + sched_context_switched(prev, next); _update_runstate_area(next); /* Must be done with interrupts enabled */ diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 4711ece1ef..ff67fb3633 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -61,6 +61,9 @@ boolean_param("sched_smt_power_savings", sched_smt_power_savings); int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US; integer_param("sched_ratelimit_us", sched_ratelimit_us); +/* Number of vcpus per struct sched_unit. */ +static unsigned int __read_mostly sched_granularity = 1; + /* Common lock for free cpus. */ static DEFINE_SPINLOCK(sched_free_cpu_lock); @@ -532,8 +535,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c) if ( IS_ERR(domdata) ) return PTR_ERR(domdata); - /* TODO: fix array size with multiple vcpus per unit. */ - unit_priv = xzalloc_array(void *, d->max_vcpus); + unit_priv = xzalloc_array(void *, + DIV_ROUND_UP(d->max_vcpus, sched_granularity)); if ( unit_priv == NULL ) { sched_free_domdata(c->sched, domdata); @@ -1714,133 +1717,325 @@ void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value) spin_unlock(&v->periodic_timer_lock); } -/* - * The main function - * - deschedule the current domain (scheduler independent). - * - pick a new domain (scheduler dependent). - */ -static void schedule(void) +static void sched_switch_units(struct sched_resource *sr, + struct sched_unit *next, struct sched_unit *prev, + s_time_t now) { - struct sched_unit *prev = current->sched_unit, *next = NULL; - s_time_t now; - struct scheduler *sched; - unsigned long *tasklet_work = &this_cpu(tasklet_work_to_do); - bool tasklet_work_scheduled = false; - struct sched_resource *sd; - spinlock_t *lock; - int cpu = smp_processor_id(); + sr->curr = next; - ASSERT_NOT_IN_ATOMIC(); + TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id, + now - prev->state_entry_time); + TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id, + (next->vcpu_list->runstate.state == RUNSTATE_runnable) ? + (now - next->state_entry_time) : 0, prev->next_time); - SCHED_STAT_CRANK(sched_run); + ASSERT(prev->vcpu_list->runstate.state == RUNSTATE_running); + + TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id, + next->domain->domain_id, next->unit_id); + + sched_unit_runstate_change(prev, false, now); + + ASSERT(next->vcpu_list->runstate.state != RUNSTATE_running); + sched_unit_runstate_change(next, true, now); - sd = get_sched_res(cpu); + /* + * NB. Don't add any trace records from here until the actual context + * switch, else lost_records resume will not work properly. + */ + + ASSERT(!next->is_running); + next->vcpu_list->is_running = 1; + next->is_running = true; + next->state_entry_time = now; +} + +static bool sched_tasklet_check_cpu(unsigned int cpu) +{ + unsigned long *tasklet_work = &per_cpu(tasklet_work_to_do, cpu); - /* Update tasklet scheduling status. */ switch ( *tasklet_work ) { case TASKLET_enqueued: set_bit(_TASKLET_scheduled, tasklet_work); /* fallthrough */ case TASKLET_enqueued|TASKLET_scheduled: - tasklet_work_scheduled = true; + return true; break; case TASKLET_scheduled: clear_bit(_TASKLET_scheduled, tasklet_work); + /* fallthrough */ case 0: - /*tasklet_work_scheduled = false;*/ + /* return false; */ break; default: BUG(); } - lock = pcpu_schedule_lock_irq(cpu); + return false; +} - now = NOW(); +static bool sched_tasklet_check(unsigned int cpu) +{ + bool tasklet_work_scheduled = false; + const cpumask_t *mask = get_sched_res(cpu)->cpus; + unsigned int cpu_iter; + + for_each_cpu ( cpu_iter, mask ) + if ( sched_tasklet_check_cpu(cpu_iter) ) + tasklet_work_scheduled = true; - stop_timer(&sd->s_timer); + return tasklet_work_scheduled; +} + +static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now, + unsigned int cpu) +{ + struct scheduler *sched = per_cpu(scheduler, cpu); + struct sched_resource *sr = get_sched_res(cpu); + struct sched_unit *next; /* get policy-specific decision on scheduling... */ - sched = this_cpu(scheduler); - sched->do_schedule(sched, prev, now, tasklet_work_scheduled); + sched->do_schedule(sched, prev, now, sched_tasklet_check(cpu)); next = prev->next_task; - sd->curr = next; - if ( prev->next_time >= 0 ) /* -ve means no limit */ - set_timer(&sd->s_timer, now + prev->next_time); + set_timer(&sr->s_timer, now + prev->next_time); + + if ( likely(prev != next) ) + sched_switch_units(sr, next, prev, now); + + return next; +} + +static void context_saved(struct vcpu *prev) +{ + struct sched_unit *unit = prev->sched_unit; + + /* Clear running flag /after/ writing context to memory. */ + smp_wmb(); + + prev->is_running = 0; + unit->is_running = false; + unit->state_entry_time = NOW(); + + /* Check for migration request /after/ clearing running flag. */ + smp_mb(); + + sched_context_saved(vcpu_scheduler(prev), unit); - if ( unlikely(prev == next) ) + sched_unit_migrate_finish(unit); +} + +/* + * Rendezvous on end of context switch. + * As no lock is protecting this rendezvous function we need to use atomic + * access functions on the counter. + * The counter will be 0 in case no rendezvous is needed. For the rendezvous + * case it is initialised to the number of cpus to rendezvous plus 1. Each + * member entering decrements the counter. The last one will decrement it to + * 1 and perform the final needed action in that case (call of context_saved() + * if vcpu was switched), and then set the counter to zero. The other members + * will wait until the counter becomes zero until they proceed. + */ +void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) +{ + struct sched_unit *next = vnext->sched_unit; + + if ( atomic_read(&next->rendezvous_out_cnt) ) + { + int cnt = atomic_dec_return(&next->rendezvous_out_cnt); + + /* Call context_saved() before releasing other waiters. */ + if ( cnt == 1 ) + { + if ( vprev != vnext ) + context_saved(vprev); + atomic_set(&next->rendezvous_out_cnt, 0); + } + else + while ( atomic_read(&next->rendezvous_out_cnt) ) + cpu_relax(); + } + else if ( vprev != vnext ) + context_saved(vprev); +} + +static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, + s_time_t now) +{ + if ( unlikely(vprev == vnext) ) { - pcpu_schedule_unlock_irq(lock, cpu); TRACE_4D(TRC_SCHED_SWITCH_INFCONT, - next->domain->domain_id, next->unit_id, - now - prev->state_entry_time, - prev->next_time); - trace_continue_running(next->vcpu_list); - return continue_running(prev->vcpu_list); + vnext->domain->domain_id, vnext->sched_unit->unit_id, + now - vprev->runstate.state_entry_time, + vprev->sched_unit->next_time); + sched_context_switched(vprev, vnext); + trace_continue_running(vnext); + return continue_running(vprev); } - TRACE_3D(TRC_SCHED_SWITCH_INFPREV, - prev->domain->domain_id, prev->unit_id, - now - prev->state_entry_time); - TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, - next->domain->domain_id, next->unit_id, - (next->vcpu_list->runstate.state == RUNSTATE_runnable) ? - (now - next->state_entry_time) : 0, - prev->next_time); + SCHED_STAT_CRANK(sched_ctx); - ASSERT(prev->vcpu_list->runstate.state == RUNSTATE_running); + stop_timer(&vprev->periodic_timer); - TRACE_4D(TRC_SCHED_SWITCH, - prev->domain->domain_id, prev->unit_id, - next->domain->domain_id, next->unit_id); + if ( vnext->sched_unit->migrated ) + vcpu_move_irqs(vnext); - sched_unit_runstate_change(prev, false, now); + vcpu_periodic_timer_work(vnext); - ASSERT(next->vcpu_list->runstate.state != RUNSTATE_running); - sched_unit_runstate_change(next, true, now); + context_switch(vprev, vnext); +} - /* - * NB. Don't add any trace records from here until the actual context - * switch, else lost_records resume will not work properly. - */ +/* + * Rendezvous before taking a scheduling decision. + * Called with schedule lock held, so all accesses to the rendezvous counter + * can be normal ones (no atomic accesses needed). + * The counter is initialized to the number of cpus to rendezvous initially. + * Each cpu entering will decrement the counter. In case the counter becomes + * zero do_schedule() is called and the rendezvous counter for leaving + * context_switch() is set. All other members will wait until the counter is + * becoming zero, dropping the schedule lock in between. + */ +static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, + spinlock_t **lock, int cpu, + s_time_t now) +{ + struct sched_unit *next; - ASSERT(!next->is_running); - next->vcpu_list->is_running = 1; - next->is_running = true; - next->state_entry_time = now; + if ( !--prev->rendezvous_in_cnt ) + { + next = do_schedule(prev, now, cpu); + atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1); + return next; + } - pcpu_schedule_unlock_irq(lock, cpu); + while ( prev->rendezvous_in_cnt ) + { + /* + * Coming from idle might need to do tasklet work. + * In order to avoid deadlocks we can't do that here, but have to + * continue the idle loop. + * Undo the rendezvous_in_cnt decrement and schedule another call of + * sched_slave(). + */ + if ( is_idle_unit(prev) && sched_tasklet_check_cpu(cpu) ) + { + struct vcpu *vprev = current; - SCHED_STAT_CRANK(sched_ctx); + prev->rendezvous_in_cnt++; + atomic_set(&prev->rendezvous_out_cnt, 0); + + pcpu_schedule_unlock_irq(*lock, cpu); + + raise_softirq(SCHED_SLAVE_SOFTIRQ); + sched_context_switch(vprev, vprev, now); + + return NULL; /* ARM only. */ + } - stop_timer(&prev->vcpu_list->periodic_timer); + pcpu_schedule_unlock_irq(*lock, cpu); - if ( next->migrated ) - vcpu_move_irqs(next->vcpu_list); + cpu_relax(); - vcpu_periodic_timer_work(next->vcpu_list); + *lock = pcpu_schedule_lock_irq(cpu); + } - context_switch(prev->vcpu_list, next->vcpu_list); + return prev->next_task; } -void context_saved(struct vcpu *prev) +static void sched_slave(void) { - /* Clear running flag /after/ writing context to memory. */ - smp_wmb(); + struct vcpu *vprev = current; + struct sched_unit *prev = vprev->sched_unit, *next; + s_time_t now; + spinlock_t *lock; + unsigned int cpu = smp_processor_id(); - prev->is_running = 0; - prev->sched_unit->is_running = false; - prev->sched_unit->state_entry_time = NOW(); + ASSERT_NOT_IN_ATOMIC(); - /* Check for migration request /after/ clearing running flag. */ - smp_mb(); + lock = pcpu_schedule_lock_irq(cpu); - sched_context_saved(vcpu_scheduler(prev), prev->sched_unit); + now = NOW(); + + if ( !prev->rendezvous_in_cnt ) + { + pcpu_schedule_unlock_irq(lock, cpu); + return; + } + + stop_timer(&get_sched_res(cpu)->s_timer); + + next = sched_wait_rendezvous_in(prev, &lock, cpu, now); + if ( !next ) + return; + + pcpu_schedule_unlock_irq(lock, cpu); - sched_unit_migrate_finish(prev->sched_unit); + sched_context_switch(vprev, next->vcpu_list, now); +} + +/* + * The main function + * - deschedule the current domain (scheduler independent). + * - pick a new domain (scheduler dependent). + */ +static void schedule(void) +{ + struct vcpu *vnext, *vprev = current; + struct sched_unit *prev = vprev->sched_unit, *next = NULL; + s_time_t now; + struct sched_resource *sr; + spinlock_t *lock; + int cpu = smp_processor_id(); + + ASSERT_NOT_IN_ATOMIC(); + + SCHED_STAT_CRANK(sched_run); + + sr = get_sched_res(cpu); + + lock = pcpu_schedule_lock_irq(cpu); + + if ( prev->rendezvous_in_cnt ) + { + /* + * We have a race: sched_slave() should be called, so raise a softirq + * in order to re-enter schedule() later and call sched_slave() now. + */ + pcpu_schedule_unlock_irq(lock, cpu); + + raise_softirq(SCHEDULE_SOFTIRQ); + return sched_slave(); + } + + stop_timer(&sr->s_timer); + + now = NOW(); + + if ( sched_granularity > 1 ) + { + cpumask_t mask; + + prev->rendezvous_in_cnt = sched_granularity; + cpumask_andnot(&mask, sr->cpus, cpumask_of(cpu)); + cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ); + next = sched_wait_rendezvous_in(prev, &lock, cpu, now); + if ( !next ) + return; + } + else + { + prev->rendezvous_in_cnt = 0; + next = do_schedule(prev, now, cpu); + atomic_set(&next->rendezvous_out_cnt, 0); + } + + pcpu_schedule_unlock_irq(lock, cpu); + + vnext = next->vcpu_list; + sched_context_switch(vprev, vnext, now); } /* The scheduler timer: force a run through the scheduler */ @@ -1881,6 +2076,7 @@ static int cpu_schedule_up(unsigned int cpu) if ( sr == NULL ) return -ENOMEM; sr->master_cpu = cpu; + sr->cpus = cpumask_of(cpu); set_sched_res(cpu, sr); per_cpu(scheduler, cpu) = &sched_idle_ops; @@ -1901,6 +2097,8 @@ static int cpu_schedule_up(unsigned int cpu) if ( idle_vcpu[cpu] == NULL ) return -ENOMEM; + idle_vcpu[cpu]->sched_unit->rendezvous_in_cnt = 0; + /* * No need to allocate any scheduler data, as cpus coming online are * free initially and the idle scheduler doesn't need any data areas @@ -2001,6 +2199,7 @@ void __init scheduler_init(void) int i; open_softirq(SCHEDULE_SOFTIRQ, schedule); + open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave); for ( i = 0; i < NUM_SCHEDULERS; i++) { diff --git a/xen/common/softirq.c b/xen/common/softirq.c index 83c3c09bd5..2d66193203 100644 --- a/xen/common/softirq.c +++ b/xen/common/softirq.c @@ -33,8 +33,8 @@ static void __do_softirq(unsigned long ignore_mask) for ( ; ; ) { /* - * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ may move - * us to another processor. + * Initialise @cpu on every iteration: SCHEDULE_SOFTIRQ or + * SCHED_SLAVE_SOFTIRQ may move us to another processor. */ cpu = smp_processor_id(); @@ -55,7 +55,7 @@ void process_pending_softirqs(void) { ASSERT(!in_irq() && local_irq_is_enabled()); /* Do not enter scheduler as it can preempt the calling context. */ - __do_softirq(1ul< X-Patchwork-Id: 11166021 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C8D7016C1 for ; Mon, 30 Sep 2019 05:23:11 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A308521855 for ; Mon, 30 Sep 2019 05:23:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A308521855 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8C-000201-1e; Mon, 30 Sep 2019 05:21:56 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8A-0001z8-Am for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:54 +0000 X-Inumbo-ID: 2e93d930-e342-11e9-b588-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 2e93d930-e342-11e9-b588-bc764e2007e4; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5E8ECB01F; Mon, 30 Sep 2019 05:21:39 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:18 +0200 Message-Id: <20190930052135.11257-3-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 02/19] xen/sched: introduce unit_runnable_state() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Robert VanVossen , Tim Deegan , Julien Grall , Josh Whitehead , Meng Xu , Jan Beulich , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Today the vcpu runstate of a new scheduled vcpu is always set to "running" even if at that time vcpu_runnable() is already returning false due to a race (e.g. with pausing the vcpu). With core scheduling this can no longer work as not all vcpus of a schedule unit have to be "running" when being scheduled. So the vcpu's new runstate has to be selected at the same time as the runnability of the related schedule unit is probed. For this purpose introduce a new helper unit_runnable_state() which will save the new runstate of all tested vcpus in a new field of the vcpu struct. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- RFC V2: - new patch V3: - add vcpu loop to unit_runnable_state() right now instead of doing so in next patch (Jan Beulich, Dario Faggioli) - make new_state unsigned int (Jan Beulich) V4: - add comment explaining unit_runnable_state() (Jan Beulich) --- xen/common/domain.c | 1 + xen/common/sched_arinc653.c | 2 +- xen/common/sched_credit.c | 49 ++++++++++++++++++++++++--------------------- xen/common/sched_credit2.c | 7 ++++--- xen/common/sched_null.c | 3 ++- xen/common/sched_rt.c | 8 +++++++- xen/common/schedule.c | 2 +- xen/include/xen/sched-if.h | 30 +++++++++++++++++++++++++++ xen/include/xen/sched.h | 1 + 9 files changed, 73 insertions(+), 30 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index 601da28c9c..a9882509ed 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -157,6 +157,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id) if ( is_idle_domain(d) ) { v->runstate.state = RUNSTATE_running; + v->new_state = RUNSTATE_running; } else { diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c index fcf81db19a..dd5876eacd 100644 --- a/xen/common/sched_arinc653.c +++ b/xen/common/sched_arinc653.c @@ -563,7 +563,7 @@ a653sched_do_schedule( if ( !((new_task != NULL) && (AUNIT(new_task) != NULL) && AUNIT(new_task)->awake - && unit_runnable(new_task)) ) + && unit_runnable_state(new_task)) ) new_task = IDLETASK(cpu); BUG_ON(new_task == NULL); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 299eff21ac..00beac3ea4 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -1894,7 +1894,7 @@ static void csched_schedule( if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags) && !tasklet_work_scheduled && prv->ratelimit - && unit_runnable(unit) + && unit_runnable_state(unit) && !is_idle_unit(unit) && runtime < prv->ratelimit ) { @@ -1939,33 +1939,36 @@ static void csched_schedule( dec_nr_runnable(sched_cpu); } - snext = __runq_elem(runq->next); - - /* Tasklet work (which runs in idle UNIT context) overrides all else. */ - if ( tasklet_work_scheduled ) - { - TRACE_0D(TRC_CSCHED_SCHED_TASKLET); - snext = CSCHED_UNIT(sched_idle_unit(sched_cpu)); - snext->pri = CSCHED_PRI_TS_BOOST; - } - /* * Clear YIELD flag before scheduling out */ clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags); - /* - * SMP Load balance: - * - * If the next highest priority local runnable UNIT has already eaten - * through its credits, look on other PCPUs to see if we have more - * urgent work... If not, csched_load_balance() will return snext, but - * already removed from the runq. - */ - if ( snext->pri > CSCHED_PRI_TS_OVER ) - __runq_remove(snext); - else - snext = csched_load_balance(prv, sched_cpu, snext, &migrated); + do { + snext = __runq_elem(runq->next); + + /* Tasklet work (which runs in idle UNIT context) overrides all else. */ + if ( tasklet_work_scheduled ) + { + TRACE_0D(TRC_CSCHED_SCHED_TASKLET); + snext = CSCHED_UNIT(sched_idle_unit(sched_cpu)); + snext->pri = CSCHED_PRI_TS_BOOST; + } + + /* + * SMP Load balance: + * + * If the next highest priority local runnable UNIT has already eaten + * through its credits, look on other PCPUs to see if we have more + * urgent work... If not, csched_load_balance() will return snext, but + * already removed from the runq. + */ + if ( snext->pri > CSCHED_PRI_TS_OVER ) + __runq_remove(snext); + else + snext = csched_load_balance(prv, sched_cpu, snext, &migrated); + + } while ( !unit_runnable_state(snext->unit) ); /* * Update idlers mask if necessary. When we're idling, other CPUs diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 87d142bbe4..0e29e56d5a 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -3291,7 +3291,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, * In fact, it may be the case that scurr is about to spin, and there's * no point forcing it to do so until rate limiting expires. */ - if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) && + if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) && (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) ) { if ( unlikely(tb_init_done) ) @@ -3345,7 +3345,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, * * Of course, we also default to idle also if scurr is not runnable. */ - if ( unit_runnable(scurr->unit) && !soft_aff_preempt ) + if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt ) snext = scurr; else snext = csched2_unit(sched_idle_unit(cpu)); @@ -3405,7 +3405,8 @@ runq_candidate(struct csched2_runqueue_data *rqd, * some budget, then choose it. */ if ( (yield || svc->credit > snext->credit) && - (!has_cap(svc) || unit_grab_budget(svc)) ) + (!has_cap(svc) || unit_grab_budget(svc)) && + unit_runnable_state(svc->unit) ) snext = svc; /* In any case, if we got this far, break. */ diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c index 80a7d45935..3dde1dcd00 100644 --- a/xen/common/sched_null.c +++ b/xen/common/sched_null.c @@ -864,7 +864,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_unit *prev, cpumask_set_cpu(sched_cpu, &prv->cpus_free); } - if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) ) + if ( unlikely(prev->next_task == NULL || + !unit_runnable_state(prev->next_task)) ) prev->next_task = sched_idle_unit(sched_cpu); NULL_UNIT_CHECK(prev->next_task); diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index cfd7d334fa..fd882f2ca4 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -1092,12 +1092,18 @@ rt_schedule(const struct scheduler *ops, struct sched_unit *currunit, else { snext = runq_pick(ops, cpumask_of(sched_cpu)); + if ( snext == NULL ) snext = rt_unit(sched_idle_unit(sched_cpu)); + else if ( !unit_runnable_state(snext->unit) ) + { + q_remove(snext); + snext = rt_unit(sched_idle_unit(sched_cpu)); + } /* if scurr has higher priority and budget, still pick scurr */ if ( !is_idle_unit(currunit) && - unit_runnable(currunit) && + unit_runnable_state(currunit) && scurr->cur_budget > 0 && ( is_idle_unit(snext->unit) || compare_unit_priority(scurr, snext) > 0 ) ) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index ff67fb3633..9c1b044b49 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -280,7 +280,7 @@ static inline void sched_unit_runstate_change(struct sched_unit *unit, for_each_sched_unit_vcpu ( unit, v ) { if ( running ) - vcpu_runstate_change(v, RUNSTATE_running, new_entry_time); + vcpu_runstate_change(v, v->new_state, new_entry_time); else vcpu_runstate_change(v, ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index c65dfa943b..7e568a9d9f 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -93,6 +93,36 @@ static inline bool unit_runnable(const struct sched_unit *unit) return false; } +/* + * Returns whether a sched_unit is runnable and sets new_state for each of its + * vcpus. It is mandatory to determine the new runstate for all vcpus of a unit + * without dropping the schedule lock (which happens when synchronizing the + * context switch of the vcpus of a unit) in order to avoid races with e.g. + * vcpu_sleep(). + */ +static inline bool unit_runnable_state(const struct sched_unit *unit) +{ + struct vcpu *v; + bool runnable, ret = false; + + if ( is_idle_unit(unit) ) + return true; + + for_each_sched_unit_vcpu ( unit, v ) + { + runnable = vcpu_runnable(v); + + v->new_state = runnable ? RUNSTATE_running + : (v->pause_flags & VPF_blocked) + ? RUNSTATE_blocked : RUNSTATE_offline; + + if ( runnable ) + ret = true; + } + + return ret; +} + static inline void sched_set_res(struct sched_unit *unit, struct sched_resource *res) { diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index c770ab4aa0..12f00cd78d 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -174,6 +174,7 @@ struct vcpu XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat; } runstate_guest; /* guest address */ #endif + unsigned int new_state; /* Has the FPU been initialised? */ bool fpu_initialised; From patchwork Mon Sep 30 05:21:19 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166005 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E44F416C1 for ; Mon, 30 Sep 2019 05:22:59 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id CA0BE20815 for ; Mon, 30 Sep 2019 05:22:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CA0BE20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo80-0001ua-95; Mon, 30 Sep 2019 05:21:44 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo7y-0001uU-JG for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:42 +0000 X-Inumbo-ID: 2e7db179-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 2e7db179-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id AA363B021; Mon, 30 Sep 2019 05:21:39 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:19 +0200 Message-Id: <20190930052135.11257-4-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 03/19] xen/sched: add support for multiple vcpus per sched unit where missing X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" In several places there is support for multiple vcpus per sched unit missing. Add that missing support (with the exception of initial allocation) and missing helpers for that. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli Acked-by: Jan Beulich --- RFC V2: - fix vcpu_runstate_helper() V1: - add special handling for idle unit in unit_runnable() and unit_runnable_state() V2: - handle affinity_broken correctly (Jan Beulich) V3: - type for cpu ->unsigned int (Jan Beulich) --- xen/common/domain.c | 5 ++++- xen/common/schedule.c | 9 +++++---- xen/include/xen/sched-if.h | 16 +++++++++++++++- 3 files changed, 24 insertions(+), 6 deletions(-) diff --git a/xen/common/domain.c b/xen/common/domain.c index a9882509ed..93aa856bcb 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -1273,7 +1273,10 @@ int vcpu_reset(struct vcpu *v) v->async_exception_mask = 0; memset(v->async_exception_state, 0, sizeof(v->async_exception_state)); #endif - v->affinity_broken = 0; + if ( v->affinity_broken & VCPU_AFFINITY_OVERRIDE ) + vcpu_temporary_affinity(v, NR_CPUS, VCPU_AFFINITY_OVERRIDE); + if ( v->affinity_broken & VCPU_AFFINITY_WAIT ) + vcpu_temporary_affinity(v, NR_CPUS, VCPU_AFFINITY_WAIT); clear_bit(_VPF_blocked, &v->pause_flags); clear_bit(_VPF_in_reset, &v->pause_flags); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 9c1b044b49..3094ff6838 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -252,8 +252,9 @@ static inline void vcpu_runstate_change( s_time_t delta; struct sched_unit *unit = v->sched_unit; - ASSERT(v->runstate.state != new_state); ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock)); + if ( v->runstate.state == new_state ) + return; vcpu_urgent_count_update(v); @@ -1729,14 +1730,14 @@ static void sched_switch_units(struct sched_resource *sr, (next->vcpu_list->runstate.state == RUNSTATE_runnable) ? (now - next->state_entry_time) : 0, prev->next_time); - ASSERT(prev->vcpu_list->runstate.state == RUNSTATE_running); + ASSERT(unit_running(prev)); TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id, next->domain->domain_id, next->unit_id); sched_unit_runstate_change(prev, false, now); - ASSERT(next->vcpu_list->runstate.state != RUNSTATE_running); + ASSERT(!unit_running(next)); sched_unit_runstate_change(next, true, now); /* @@ -1858,7 +1859,7 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) while ( atomic_read(&next->rendezvous_out_cnt) ) cpu_relax(); } - else if ( vprev != vnext ) + else if ( vprev != vnext && sched_granularity == 1 ) context_saved(vprev); } diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 7e568a9d9f..983f2ece83 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -81,6 +81,11 @@ static inline bool is_unit_online(const struct sched_unit *unit) return false; } +static inline unsigned int unit_running(const struct sched_unit *unit) +{ + return unit->runstate_cnt[RUNSTATE_running]; +} + /* Returns true if at least one vcpu of the unit is runnable. */ static inline bool unit_runnable(const struct sched_unit *unit) { @@ -126,7 +131,16 @@ static inline bool unit_runnable_state(const struct sched_unit *unit) static inline void sched_set_res(struct sched_unit *unit, struct sched_resource *res) { - unit->vcpu_list->processor = res->master_cpu; + unsigned int cpu = cpumask_first(res->cpus); + struct vcpu *v; + + for_each_sched_unit_vcpu ( unit, v ) + { + ASSERT(cpu < nr_cpu_ids); + v->processor = cpu; + cpu = cpumask_next(cpu, res->cpus); + } + unit->res = res; } From patchwork Mon Sep 30 05:21:20 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166011 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 970B816C1 for ; Mon, 30 Sep 2019 05:23:04 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 72337218AC for ; Mon, 30 Sep 2019 05:23:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 72337218AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo81-0001up-Hy; Mon, 30 Sep 2019 05:21:45 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo80-0001uZ-AK for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:44 +0000 X-Inumbo-ID: 2ed02a5c-e342-11e9-97fb-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 2ed02a5c-e342-11e9-97fb-bc764e2007e4; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0E1C8B027; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:20 +0200 Message-Id: <20190930052135.11257-5-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 04/19] xen/sched: modify cpupool_domain_cpumask() to be an unit mask X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Robert VanVossen , Dario Faggioli , Julien Grall , Josh Whitehead , Meng Xu , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" cpupool_domain_cpumask() is used by scheduling to select cpus or to iterate over cpus. In order to support scheduling units spanning multiple cpus rename cpupool_domain_cpumask() to cpupool_domain_master_cpumask() and let it return a cpumask with only one bit set per scheduling resource. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V4: - rename to cpupool_domain_master_cpumask() (Jan Beulich) - check return value of zalloc_cpumask_var() (Jan Beulich) --- xen/common/cpupool.c | 27 ++++++++++++++++++--------- xen/common/domain.c | 2 +- xen/common/domctl.c | 2 +- xen/common/sched_arinc653.c | 2 +- xen/common/sched_credit.c | 4 ++-- xen/common/sched_credit2.c | 22 +++++++++++----------- xen/common/sched_null.c | 8 ++++---- xen/common/sched_rt.c | 8 ++++---- xen/common/schedule.c | 13 +++++++------ xen/include/xen/sched-if.h | 9 ++++++--- 10 files changed, 55 insertions(+), 42 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index fd30040922..441a26f16c 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -36,26 +36,33 @@ static DEFINE_SPINLOCK(cpupool_lock); DEFINE_PER_CPU(struct cpupool *, cpupool); +static void free_cpupool_struct(struct cpupool *c) +{ + if ( c ) + { + free_cpumask_var(c->res_valid); + free_cpumask_var(c->cpu_valid); + } + xfree(c); +} + static struct cpupool *alloc_cpupool_struct(void) { struct cpupool *c = xzalloc(struct cpupool); - if ( !c || !zalloc_cpumask_var(&c->cpu_valid) ) + if ( !c ) + return NULL; + + if ( !zalloc_cpumask_var(&c->cpu_valid) || + !zalloc_cpumask_var(&c->res_valid) ) { - xfree(c); + free_cpupool_struct(c); c = NULL; } return c; } -static void free_cpupool_struct(struct cpupool *c) -{ - if ( c ) - free_cpumask_var(c->cpu_valid); - xfree(c); -} - /* * find a cpupool by it's id. to be called with cpupool lock held * if exact is not specified, the first cpupool with an id larger or equal to @@ -269,6 +276,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) cpupool_cpu_moving = NULL; } cpumask_set_cpu(cpu, c->cpu_valid); + cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask); rcu_read_lock(&domlist_read_lock); for_each_domain_in_cpupool(d, c) @@ -361,6 +369,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c, unsigned int cpu) atomic_inc(&c->refcnt); cpupool_cpu_moving = c; cpumask_clear_cpu(cpu, c->cpu_valid); + cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask); out: spin_unlock(&cpupool_lock); diff --git a/xen/common/domain.c b/xen/common/domain.c index 93aa856bcb..9c7360ed2a 100644 --- a/xen/common/domain.c +++ b/xen/common/domain.c @@ -584,7 +584,7 @@ void domain_update_node_affinity(struct domain *d) return; } - online = cpupool_domain_cpumask(d); + online = cpupool_domain_master_cpumask(d); spin_lock(&d->node_affinity_lock); diff --git a/xen/common/domctl.c b/xen/common/domctl.c index 8a694e0d37..d597a09f98 100644 --- a/xen/common/domctl.c +++ b/xen/common/domctl.c @@ -619,7 +619,7 @@ long do_domctl(XEN_GUEST_HANDLE_PARAM(xen_domctl_t) u_domctl) if ( op->cmd == XEN_DOMCTL_setvcpuaffinity ) { cpumask_var_t new_affinity, old_affinity; - cpumask_t *online = cpupool_domain_cpumask(v->domain); + cpumask_t *online = cpupool_domain_master_cpumask(v->domain); /* * We want to be able to restore hard affinity if we are trying diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c index dd5876eacd..45c05c6cd9 100644 --- a/xen/common/sched_arinc653.c +++ b/xen/common/sched_arinc653.c @@ -614,7 +614,7 @@ a653sched_pick_resource(const struct scheduler *ops, * If present, prefer unit's current processor, else * just find the first valid unit. */ - online = cpupool_domain_cpumask(unit->domain); + online = cpupool_domain_master_cpumask(unit->domain); cpu = cpumask_first(online); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 00beac3ea4..a6dff8ec62 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -361,7 +361,7 @@ static inline void __runq_tickle(struct csched_unit *new) ASSERT(cur); cpumask_clear(&mask); - online = cpupool_domain_cpumask(new->sdom->dom); + online = cpupool_domain_master_cpumask(new->sdom->dom); cpumask_and(&idle_mask, prv->idlers, online); idlers_empty = cpumask_empty(&idle_mask); @@ -724,7 +724,7 @@ _csched_cpu_pick(const struct scheduler *ops, const struct sched_unit *unit, /* We must always use cpu's scratch space */ cpumask_t *cpus = cpumask_scratch_cpu(cpu); cpumask_t idlers; - cpumask_t *online = cpupool_domain_cpumask(unit->domain); + cpumask_t *online = cpupool_domain_master_cpumask(unit->domain); struct csched_pcpu *spc = NULL; int balance_step; diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 0e29e56d5a..d51df05887 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -705,7 +705,7 @@ static int get_fallback_cpu(struct csched2_unit *svc) affinity_balance_cpumask(unit, bs, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); /* * This is cases 1 or 3 (depending on bs): if processor is (still) @@ -1440,7 +1440,7 @@ runq_tickle(const struct scheduler *ops, struct csched2_unit *new, s_time_t now) struct sched_unit *unit = new->unit; unsigned int bs, cpu = sched_unit_master(unit); struct csched2_runqueue_data *rqd = c2rqd(ops, cpu); - cpumask_t *online = cpupool_domain_cpumask(unit->domain); + cpumask_t *online = cpupool_domain_master_cpumask(unit->domain); cpumask_t mask; ASSERT(new->rqd == rqd); @@ -2243,7 +2243,7 @@ csched2_res_pick(const struct scheduler *ops, const struct sched_unit *unit) } cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); /* * First check to see if we're here because someone else suggested a place @@ -2358,8 +2358,8 @@ csched2_res_pick(const struct scheduler *ops, const struct sched_unit *unit) * ok because: * - we know that unit->cpu_hard_affinity and ->cpu_soft_affinity have * a non-empty intersection (because has_soft is true); - * - we have unit->cpu_hard_affinity & cpupool_domain_cpumask() already - * in cpumask_scratch, we do save a lot doing like this. + * - we have unit->cpu_hard_affinity & cpupool_domain_master_cpumask() + * already in cpumask_scratch, we do save a lot doing like this. * * It's kind of like open coding affinity_balance_cpumask() but, in * this specific case, calling that would mean a lot of (unnecessary) @@ -2378,7 +2378,7 @@ csched2_res_pick(const struct scheduler *ops, const struct sched_unit *unit) * affinity, so go for it. * * cpumask_scratch already has unit->cpu_hard_affinity & - * cpupool_domain_cpumask() in it, so it's enough that we filter + * cpupool_domain_master_cpumask() in it, so it's enough that we filter * with the cpus of the runq. */ cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), @@ -2513,7 +2513,7 @@ static void migrate(const struct scheduler *ops, _runq_deassign(svc); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), &trqd->active); sched_set_res(unit, @@ -2547,7 +2547,7 @@ static bool unit_is_migrateable(struct csched2_unit *svc, int cpu = sched_unit_master(unit); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); return !(svc->flags & CSFLAG_runq_migrate_request) && cpumask_intersects(cpumask_scratch_cpu(cpu), &rqd->active); @@ -2763,7 +2763,7 @@ csched2_unit_migrate( * v->processor will be chosen, and during actual domain unpause that * the unit will be assigned to and added to the proper runqueue. */ - if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_cpumask(d))) ) + if ( unlikely(!cpumask_test_cpu(new_cpu, cpupool_domain_master_cpumask(d))) ) { ASSERT(system_state == SYS_STATE_suspend); if ( unit_on_runq(svc) ) @@ -3069,7 +3069,7 @@ csched2_alloc_domdata(const struct scheduler *ops, struct domain *dom) sdom->nr_units = 0; init_timer(&sdom->repl_timer, replenish_domain_budget, sdom, - cpumask_any(cpupool_domain_cpumask(dom))); + cpumask_any(cpupool_domain_master_cpumask(dom))); spin_lock_init(&sdom->budget_lock); INIT_LIST_HEAD(&sdom->parked_units); @@ -3317,7 +3317,7 @@ runq_candidate(struct csched2_runqueue_data *rqd, cpumask_scratch); if ( unlikely(!cpumask_test_cpu(cpu, cpumask_scratch)) ) { - cpumask_t *online = cpupool_domain_cpumask(scurr->unit->domain); + cpumask_t *online = cpupool_domain_master_cpumask(scurr->unit->domain); /* Ok, is any of the pcpus in scurr soft-affinity idle? */ cpumask_and(cpumask_scratch, cpumask_scratch, &rqd->idle); diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c index 3dde1dcd00..2525464a7c 100644 --- a/xen/common/sched_null.c +++ b/xen/common/sched_null.c @@ -125,7 +125,7 @@ static inline bool unit_check_affinity(struct sched_unit *unit, { affinity_balance_cpumask(unit, balance_step, cpumask_scratch_cpu(cpu)); cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu), - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); return cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu)); } @@ -266,7 +266,7 @@ pick_res(struct null_private *prv, const struct sched_unit *unit) { unsigned int bs; unsigned int cpu = sched_unit_master(unit), new_cpu; - cpumask_t *cpus = cpupool_domain_cpumask(unit->domain); + cpumask_t *cpus = cpupool_domain_master_cpumask(unit->domain); ASSERT(spin_is_locked(get_sched_res(cpu)->schedule_lock)); @@ -467,7 +467,7 @@ static void null_unit_insert(const struct scheduler *ops, lock = unit_schedule_lock(unit); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); /* If the pCPU is free, we assign unit to it */ if ( likely(per_cpu(npc, cpu).unit == NULL) ) @@ -579,7 +579,7 @@ static void null_unit_wake(const struct scheduler *ops, spin_unlock(&prv->waitq_lock); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(unit->domain)); + cpupool_domain_master_cpumask(unit->domain)); if ( !cpumask_intersects(&prv->cpus_free, cpumask_scratch_cpu(cpu)) ) { diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index fd882f2ca4..d21c416cae 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -326,7 +326,7 @@ rt_dump_unit(const struct scheduler *ops, const struct rt_unit *svc) */ mask = cpumask_scratch_cpu(sched_unit_master(svc->unit)); - cpupool_mask = cpupool_domain_cpumask(svc->unit->domain); + cpupool_mask = cpupool_domain_master_cpumask(svc->unit->domain); cpumask_and(mask, cpupool_mask, svc->unit->cpu_hard_affinity); printk("[%5d.%-2u] cpu %u, (%"PRI_stime", %"PRI_stime")," " cur_b=%"PRI_stime" cur_d=%"PRI_stime" last_start=%"PRI_stime"\n" @@ -642,7 +642,7 @@ rt_res_pick(const struct scheduler *ops, const struct sched_unit *unit) cpumask_t *online; int cpu; - online = cpupool_domain_cpumask(unit->domain); + online = cpupool_domain_master_cpumask(unit->domain); cpumask_and(&cpus, online, unit->cpu_hard_affinity); cpu = cpumask_test_cpu(sched_unit_master(unit), &cpus) @@ -1016,7 +1016,7 @@ runq_pick(const struct scheduler *ops, const cpumask_t *mask) iter_svc = q_elem(iter); /* mask cpu_hard_affinity & cpupool & mask */ - online = cpupool_domain_cpumask(iter_svc->unit->domain); + online = cpupool_domain_master_cpumask(iter_svc->unit->domain); cpumask_and(&cpu_common, online, iter_svc->unit->cpu_hard_affinity); cpumask_and(&cpu_common, mask, &cpu_common); if ( cpumask_empty(&cpu_common) ) @@ -1191,7 +1191,7 @@ runq_tickle(const struct scheduler *ops, struct rt_unit *new) if ( new == NULL || is_idle_unit(new->unit) ) return; - online = cpupool_domain_cpumask(new->unit->domain); + online = cpupool_domain_master_cpumask(new->unit->domain); cpumask_and(¬_tickled, online, new->unit->cpu_hard_affinity); cpumask_andnot(¬_tickled, ¬_tickled, &prv->tickled); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 3094ff6838..36b1d3df6e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -63,6 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Number of vcpus per struct sched_unit. */ static unsigned int __read_mostly sched_granularity = 1; +const cpumask_t *sched_res_mask = &cpumask_all; /* Common lock for free cpus. */ static DEFINE_SPINLOCK(sched_free_cpu_lock); @@ -188,7 +189,7 @@ static inline struct scheduler *vcpu_scheduler(const struct vcpu *v) { return unit_scheduler(v->sched_unit); } -#define VCPU2ONLINE(_v) cpupool_domain_cpumask((_v)->domain) +#define VCPU2ONLINE(_v) cpupool_domain_master_cpumask((_v)->domain) static inline void trace_runstate_change(struct vcpu *v, int new_state) { @@ -425,9 +426,9 @@ static unsigned int sched_select_initial_cpu(const struct vcpu *v) cpumask_clear(cpus); for_each_node_mask ( node, d->node_affinity ) cpumask_or(cpus, cpus, &node_to_cpumask(node)); - cpumask_and(cpus, cpus, cpupool_domain_cpumask(d)); + cpumask_and(cpus, cpus, d->cpupool->cpu_valid); if ( cpumask_empty(cpus) ) - cpumask_copy(cpus, cpupool_domain_cpumask(d)); + cpumask_copy(cpus, d->cpupool->cpu_valid); if ( v->vcpu_id == 0 ) cpu_ret = cpumask_first(cpus); @@ -973,7 +974,7 @@ void restore_vcpu_affinity(struct domain *d) lock = unit_schedule_lock_irq(unit); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(d)); + cpupool_domain_master_cpumask(d)); if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) { if ( sched_check_affinity_broken(unit) ) @@ -981,7 +982,7 @@ void restore_vcpu_affinity(struct domain *d) sched_set_affinity(unit, unit->cpu_hard_affinity_saved, NULL); sched_reset_affinity_broken(unit); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(d)); + cpupool_domain_master_cpumask(d)); } if ( cpumask_empty(cpumask_scratch_cpu(cpu)) ) @@ -991,7 +992,7 @@ void restore_vcpu_affinity(struct domain *d) unit->vcpu_list); sched_set_affinity(unit, &cpumask_all, NULL); cpumask_and(cpumask_scratch_cpu(cpu), unit->cpu_hard_affinity, - cpupool_domain_cpumask(d)); + cpupool_domain_master_cpumask(d)); } } diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 983f2ece83..1b296b150f 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -22,6 +22,8 @@ extern cpumask_t cpupool_free_cpus; #define SCHED_DEFAULT_RATELIMIT_US 1000 extern int sched_ratelimit_us; +/* Scheduling resource mask. */ +extern const cpumask_t *sched_res_mask; /* * In order to allow a scheduler to remap the lock->cpu mapping, @@ -535,6 +537,7 @@ struct cpupool int cpupool_id; unsigned int n_dom; cpumask_var_t cpu_valid; /* all cpus assigned to pool */ + cpumask_var_t res_valid; /* all scheduling resources of pool */ struct cpupool *next; struct scheduler *sched; atomic_t refcnt; @@ -543,14 +546,14 @@ struct cpupool #define cpupool_online_cpumask(_pool) \ (((_pool) == NULL) ? &cpu_online_map : (_pool)->cpu_valid) -static inline cpumask_t *cpupool_domain_cpumask(const struct domain *d) +static inline cpumask_t *cpupool_domain_master_cpumask(const struct domain *d) { /* * d->cpupool is NULL only for the idle domain, and no one should * be interested in calling this for the idle domain. */ ASSERT(d->cpupool != NULL); - return d->cpupool->cpu_valid; + return d->cpupool->res_valid; } /* @@ -590,7 +593,7 @@ static inline cpumask_t *cpupool_domain_cpumask(const struct domain *d) static inline int has_soft_affinity(const struct sched_unit *unit) { return unit->soft_aff_effective && - !cpumask_subset(cpupool_domain_cpumask(unit->domain), + !cpumask_subset(cpupool_domain_master_cpumask(unit->domain), unit->cpu_soft_affinity); } From patchwork Mon Sep 30 05:21:21 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166009 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 29E7A13B1 for ; Mon, 30 Sep 2019 05:23:03 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 1014220815 for ; Mon, 30 Sep 2019 05:23:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1014220815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo89-0001yi-MP; Mon, 30 Sep 2019 05:21:53 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo88-0001y2-Fd for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:52 +0000 X-Inumbo-ID: 300d1d30-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 300d1d30-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 32308B044; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:21 +0200 Message-Id: <20190930052135.11257-6-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 05/19] xen/sched: support allocating multiple vcpus into one sched unit X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , George Dunlap , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" With a scheduling granularity greater than 1 multiple vcpus share the same struct sched_unit. Support that. Setting the initial processor must be done carefully: we can't use sched_set_res() as that relies on for_each_sched_unit_vcpu() which in turn needs the vcpu already as a member of the domain's vcpu linked list, which isn't the case. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V4: - merge patch 36 of V3 into this one (Jan Beulich) - add some comments (Jan Beulich) - use unit_id instead of vcpu_list->vcpu_id (Jan Beulich) --- xen/common/schedule.c | 97 ++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 76 insertions(+), 21 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 36b1d3df6e..37002b4c0e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -349,7 +349,7 @@ static void sched_spin_unlock_double(spinlock_t *lock1, spinlock_t *lock2, spin_unlock_irqrestore(lock1, flags); } -static void sched_free_unit(struct sched_unit *unit) +static void sched_free_unit_mem(struct sched_unit *unit) { struct sched_unit *prev_unit; struct domain *d = unit->domain; @@ -368,8 +368,6 @@ static void sched_free_unit(struct sched_unit *unit) } } - unit->vcpu_list->sched_unit = NULL; - free_cpumask_var(unit->cpu_hard_affinity); free_cpumask_var(unit->cpu_hard_affinity_saved); free_cpumask_var(unit->cpu_soft_affinity); @@ -377,18 +375,65 @@ static void sched_free_unit(struct sched_unit *unit) xfree(unit); } +static void sched_free_unit(struct sched_unit *unit, struct vcpu *v) +{ + struct vcpu *vunit; + unsigned int cnt = 0; + + /* Don't count to be released vcpu, might be not in vcpu list yet. */ + for_each_sched_unit_vcpu ( unit, vunit ) + if ( vunit != v ) + cnt++; + + v->sched_unit = NULL; + unit->runstate_cnt[v->runstate.state]--; + + if ( unit->vcpu_list == v ) + unit->vcpu_list = v->next_in_list; + + if ( !cnt ) + sched_free_unit_mem(unit); +} + +static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v) +{ + v->sched_unit = unit; + + /* All but idle vcpus are allocated with sequential vcpu_id. */ + if ( !unit->vcpu_list || unit->vcpu_list->vcpu_id > v->vcpu_id ) + { + unit->vcpu_list = v; + /* + * unit_id is always the same as lowest vcpu_id of unit. + * This is used for stopping for_each_sched_unit_vcpu() loop and in + * order to support cpupools with different granularities. + */ + unit->unit_id = v->vcpu_id; + } + unit->runstate_cnt[v->runstate.state]++; +} + static struct sched_unit *sched_alloc_unit(struct vcpu *v) { struct sched_unit *unit, **prev_unit; struct domain *d = v->domain; + for_each_sched_unit ( d, unit ) + if ( unit->unit_id / sched_granularity == + v->vcpu_id / sched_granularity ) + break; + + if ( unit ) + { + sched_unit_add_vcpu(unit, v); + return unit; + } + if ( (unit = xzalloc(struct sched_unit)) == NULL ) return NULL; - unit->vcpu_list = v; - unit->unit_id = v->vcpu_id; unit->domain = d; - unit->runstate_cnt[v->runstate.state]++; + sched_unit_add_vcpu(unit, v); for ( prev_unit = &d->sched_unit_list; *prev_unit; prev_unit = &(*prev_unit)->next_in_list ) @@ -404,12 +449,10 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) goto fail; - v->sched_unit = unit; - return unit; fail: - sched_free_unit(unit); + sched_free_unit(unit, v); return NULL; } @@ -459,21 +502,26 @@ int sched_init_vcpu(struct vcpu *v) else processor = sched_select_initial_cpu(v); - sched_set_res(unit, get_sched_res(processor)); - /* Initialise the per-vcpu timers. */ spin_lock_init(&v->periodic_timer_lock); - init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, - v, v->processor); - init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn, - v, v->processor); - init_timer(&v->poll_timer, poll_timer_fn, - v, v->processor); + init_timer(&v->periodic_timer, vcpu_periodic_timer_fn, v, processor); + init_timer(&v->singleshot_timer, vcpu_singleshot_timer_fn, v, processor); + init_timer(&v->poll_timer, poll_timer_fn, v, processor); + + /* If this is not the first vcpu of the unit we are done. */ + if ( unit->priv != NULL ) + { + v->processor = processor; + return 0; + } + + /* The first vcpu of an unit can be set via sched_set_res(). */ + sched_set_res(unit, get_sched_res(processor)); unit->priv = sched_alloc_udata(dom_scheduler(d), unit, d->sched_priv); if ( unit->priv == NULL ) { - sched_free_unit(unit); + sched_free_unit(unit, v); return 1; } @@ -633,9 +681,16 @@ void sched_destroy_vcpu(struct vcpu *v) kill_timer(&v->poll_timer); if ( test_and_clear_bool(v->is_urgent) ) atomic_dec(&per_cpu(sched_urgent_count, v->processor)); - sched_remove_unit(vcpu_scheduler(v), unit); - sched_free_udata(vcpu_scheduler(v), unit->priv); - sched_free_unit(unit); + /* + * Vcpus are being destroyed top-down. So being the first vcpu of an unit + * is the same as being the only one. + */ + if ( unit->vcpu_list == v ) + { + sched_remove_unit(vcpu_scheduler(v), unit); + sched_free_udata(vcpu_scheduler(v), unit->priv); + sched_free_unit(unit, v); + } } int sched_init_domain(struct domain *d, int poolid) From patchwork Mon Sep 30 05:21:22 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166025 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 1690417D4 for ; Mon, 30 Sep 2019 05:23:13 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id F12EF20815 for ; Mon, 30 Sep 2019 05:23:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F12EF20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8G-00023v-Uj; Mon, 30 Sep 2019 05:22:00 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8F-00022T-At for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:59 +0000 X-Inumbo-ID: 300de21a-e342-11e9-b588-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 300de21a-e342-11e9-b588-bc764e2007e4; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5B7C3B061; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:22 +0200 Message-Id: <20190930052135.11257-7-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 06/19] xen/sched: add a percpu resource index X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , George Dunlap , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Add a percpu variable holding the index of the cpu in the current sched_resource structure. This index is used to get the correct vcpu of a sched_unit on a specific cpu. For now this index will be zero for all cpus, but with core scheduling it will be possible to have higher values, too. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- RFC V2: new patch (carved out from RFC V1 patch 49) V4: - make function parameter const (Jan Beulich) --- xen/common/schedule.c | 11 +++++++++-- 1 file changed, 9 insertions(+), 2 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 37002b4c0e..c8e2999407 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -77,6 +77,7 @@ static void poll_timer_fn(void *data); /* This is global for now so that private implementations can reach it */ DEFINE_PER_CPU(struct scheduler *, scheduler); DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res); +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx); /* Scratch space for cpumasks. */ DEFINE_PER_CPU(cpumask_t, cpumask_scratch); @@ -144,6 +145,12 @@ static struct scheduler sched_idle_ops = { .switch_sched = sched_idle_switch_sched, }; +static inline struct vcpu *sched_unit2vcpu_cpu(const struct sched_unit *unit, + unsigned int cpu) +{ + return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)]; +} + static inline struct scheduler *dom_scheduler(const struct domain *d) { if ( likely(d->cpupool != NULL) ) @@ -2030,7 +2037,7 @@ static void sched_slave(void) pcpu_schedule_unlock_irq(lock, cpu); - sched_context_switch(vprev, next->vcpu_list, now); + sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now); } /* @@ -2091,7 +2098,7 @@ static void schedule(void) pcpu_schedule_unlock_irq(lock, cpu); - vnext = next->vcpu_list; + vnext = sched_unit2vcpu_cpu(next, cpu); sched_context_switch(vprev, vnext, now); } From patchwork Mon Sep 30 05:21:23 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166043 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8152313B1 for ; Mon, 30 Sep 2019 05:23:33 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5BD9920815 for ; Mon, 30 Sep 2019 05:23:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5BD9920815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8V-0002JK-Ir; Mon, 30 Sep 2019 05:22:15 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8U-0002Hl-Au for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:14 +0000 X-Inumbo-ID: 302ab0ac-e342-11e9-8628-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 302ab0ac-e342-11e9-8628-bc764e2007e4; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id BDD98B07B; Mon, 30 Sep 2019 05:21:40 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:23 +0200 Message-Id: <20190930052135.11257-8-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 07/19] xen/sched: add fall back to idle vcpu when scheduling unit X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , Volodymyr Babchuk , =?utf-8?q?Roger_Pau_Monn?= =?utf-8?q?=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" When scheduling an unit with multiple vcpus there is no guarantee all vcpus are available (e.g. above maxvcpus or vcpu offline). Fall back to idle vcpu of the current cpu in that case. This requires to store the correct schedule_unit pointer in the idle vcpu as long as it used as fallback vcpu. In order to modify the runstates of the correct vcpus when switching schedule units merge sched_unit_runstate_change() into sched_switch_units() and loop over the affected physical cpus instead of the unit's vcpus. This in turn requires an access function to the current variable of other cpus. Today context_saved() is called in case previous and next vcpus differ when doing a context switch. With an idle vcpu being capable to be a substitute for an offline vcpu this is problematic when switching to an idle scheduling unit. An idle previous vcpu leaves us in doubt which schedule unit was active previously, so save the previous unit pointer in the per-schedule resource area. If it is NULL the unit has not changed and we don't have to set the previous unit to be not running. When running an idle vcpu in a non-idle scheduling unit use a specific guest idle loop not performing any non-softirq tasklets and livepatching in order to avoid populating the cpu caches with memory used by other domains (as far as possible). Softirqs are considered to be save. In order to avoid livepatching when going to guest idle another variant of reset_stack_and_jump() not calling check_for_livepatch_work is needed. Signed-off-by: Juergen Gross Acked-by: Julien Grall Reviewed-by: Dario Faggioli Acked-by: Jan Beulich --- RFC V2: - new patch (Andrew Cooper) V1: - use urgent_count to select correct idle routine (Jan Beulich) V2: - set vcpu->is_running in context_saved() - introduce reset_stack_and_jump_nolp() (Jan Beulich) - readd scrubbing (Jan Beulich, Andrew Cooper) - get_cpu_current() _NOT_ moved to include/asm-x86/current.h as the needed reference of stack_base[] results in a #include hell V3: - split context_saved() into unit_context_saved() and vcpu_context_saved() V4: - rename sd -> sr (Jan Beulich) - use unsigned int for cpu (Jan Beulich) - add comment in sched_context_switch() (Jan Beulich) - add comment before definition of get_cpu_current() (Jan Beulich) V5: - add comment (Dario Faggioli) --- xen/arch/x86/domain.c | 23 +++++ xen/common/schedule.c | 195 +++++++++++++++++++++++++++++------------- xen/include/asm-arm/current.h | 1 + xen/include/asm-x86/current.h | 19 +++- xen/include/asm-x86/smp.h | 7 ++ xen/include/xen/sched-if.h | 4 +- xen/include/xen/sched.h | 1 + 7 files changed, 187 insertions(+), 63 deletions(-) diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index 27f99d3bcc..c8d7f491ea 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -159,6 +159,25 @@ static void idle_loop(void) } } +/* + * Idle loop for siblings in active schedule units. + * We don't do any standard idle work like tasklets or livepatching. + */ +static void guest_idle_loop(void) +{ + unsigned int cpu = smp_processor_id(); + + for ( ; ; ) + { + ASSERT(!cpu_is_offline(cpu)); + + if ( !softirq_pending(cpu) && !scrub_free_pages() && + !softirq_pending(cpu)) + sched_guest_idle(pm_idle, cpu); + do_softirq(); + } +} + void startup_cpu_idle_loop(void) { struct vcpu *v = current; @@ -172,6 +191,10 @@ void startup_cpu_idle_loop(void) static void noreturn continue_idle_domain(struct vcpu *v) { + /* Idle vcpus might be attached to non-idle units! */ + if ( !is_idle_domain(v->sched_unit->domain) ) + reset_stack_and_jump_nolp(guest_idle_loop); + reset_stack_and_jump(idle_loop); } diff --git a/xen/common/schedule.c b/xen/common/schedule.c index c8e2999407..b4c4b04ebe 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -145,10 +145,21 @@ static struct scheduler sched_idle_ops = { .switch_sched = sched_idle_switch_sched, }; +static inline struct vcpu *unit2vcpu_cpu(const struct sched_unit *unit, + unsigned int cpu) +{ + unsigned int idx = unit->unit_id + per_cpu(sched_res_idx, cpu); + const struct domain *d = unit->domain; + + return (idx < d->max_vcpus) ? d->vcpu[idx] : NULL; +} + static inline struct vcpu *sched_unit2vcpu_cpu(const struct sched_unit *unit, unsigned int cpu) { - return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)]; + struct vcpu *v = unit2vcpu_cpu(unit, cpu); + + return (v && v->new_state == RUNSTATE_running) ? v : idle_vcpu[cpu]; } static inline struct scheduler *dom_scheduler(const struct domain *d) @@ -268,8 +279,11 @@ static inline void vcpu_runstate_change( trace_runstate_change(v, new_state); - unit->runstate_cnt[v->runstate.state]--; - unit->runstate_cnt[new_state]++; + if ( !is_idle_vcpu(v) ) + { + unit->runstate_cnt[v->runstate.state]--; + unit->runstate_cnt[new_state]++; + } delta = new_entry_time - v->runstate.state_entry_time; if ( delta > 0 ) @@ -281,21 +295,18 @@ static inline void vcpu_runstate_change( v->runstate.state = new_state; } -static inline void sched_unit_runstate_change(struct sched_unit *unit, - bool running, s_time_t new_entry_time) +void sched_guest_idle(void (*idle) (void), unsigned int cpu) { - struct vcpu *v; - - for_each_sched_unit_vcpu ( unit, v ) - { - if ( running ) - vcpu_runstate_change(v, v->new_state, new_entry_time); - else - vcpu_runstate_change(v, - ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : - (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)), - new_entry_time); - } + /* + * Another vcpu of the unit is active in guest context while this one is + * idle. In case of a scheduling event we don't want to have high latencies + * due to a cpu needing to wake up from deep C state for joining the + * rendezvous, so avoid those deep C states by incrementing the urgent + * count of the cpu. + */ + atomic_inc(&per_cpu(sched_urgent_count, cpu)); + idle(); + atomic_dec(&per_cpu(sched_urgent_count, cpu)); } void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) @@ -545,6 +556,7 @@ int sched_init_vcpu(struct vcpu *v) if ( is_idle_domain(d) ) { get_sched_res(v->processor)->curr = unit; + get_sched_res(v->processor)->sched_unit_idle = unit; v->is_running = 1; unit->is_running = true; unit->state_entry_time = NOW(); @@ -877,7 +889,7 @@ static void sched_unit_move_locked(struct sched_unit *unit, * * sched_unit_migrate_finish() will do the work now if it can, or simply * return if it can't (because unit is still running); in that case - * sched_unit_migrate_finish() will be called by context_saved(). + * sched_unit_migrate_finish() will be called by unit_context_saved(). */ static void sched_unit_migrate_start(struct sched_unit *unit) { @@ -900,7 +912,7 @@ static void sched_unit_migrate_finish(struct sched_unit *unit) /* * If the unit is currently running, this will be handled by - * context_saved(); and in any case, if the bit is cleared, then + * unit_context_saved(); and in any case, if the bit is cleared, then * someone else has already done the work so we don't need to. */ if ( unit->is_running ) @@ -1785,33 +1797,66 @@ static void sched_switch_units(struct sched_resource *sr, struct sched_unit *next, struct sched_unit *prev, s_time_t now) { - sr->curr = next; - - TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->unit_id, - now - prev->state_entry_time); - TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->unit_id, - (next->vcpu_list->runstate.state == RUNSTATE_runnable) ? - (now - next->state_entry_time) : 0, prev->next_time); + unsigned int cpu; ASSERT(unit_running(prev)); - TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id, - next->domain->domain_id, next->unit_id); + if ( prev != next ) + { + sr->curr = next; + sr->prev = prev; - sched_unit_runstate_change(prev, false, now); + TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, + prev->unit_id, now - prev->state_entry_time); + TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, + next->unit_id, + (next->vcpu_list->runstate.state == RUNSTATE_runnable) ? + (now - next->state_entry_time) : 0, prev->next_time); + TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->unit_id, + next->domain->domain_id, next->unit_id); - ASSERT(!unit_running(next)); - sched_unit_runstate_change(next, true, now); + ASSERT(!unit_running(next)); - /* - * NB. Don't add any trace records from here until the actual context - * switch, else lost_records resume will not work properly. - */ + /* + * NB. Don't add any trace records from here until the actual context + * switch, else lost_records resume will not work properly. + */ + + ASSERT(!next->is_running); + next->is_running = true; + next->state_entry_time = now; + + if ( is_idle_unit(prev) ) + { + prev->runstate_cnt[RUNSTATE_running] = 0; + prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity; + } + if ( is_idle_unit(next) ) + { + next->runstate_cnt[RUNSTATE_running] = sched_granularity; + next->runstate_cnt[RUNSTATE_runnable] = 0; + } + } + + for_each_cpu ( cpu, sr->cpus ) + { + struct vcpu *vprev = get_cpu_current(cpu); + struct vcpu *vnext = sched_unit2vcpu_cpu(next, cpu); + + if ( vprev != vnext || vprev->runstate.state != vnext->new_state ) + { + vcpu_runstate_change(vprev, + ((vprev->pause_flags & VPF_blocked) ? RUNSTATE_blocked : + (vcpu_runnable(vprev) ? RUNSTATE_runnable : RUNSTATE_offline)), + now); + vcpu_runstate_change(vnext, vnext->new_state, now); + } - ASSERT(!next->is_running); - next->vcpu_list->is_running = 1; - next->is_running = true; - next->state_entry_time = now; + vnext->is_running = 1; + + if ( is_idle_vcpu(vnext) ) + vnext->sched_unit = next; + } } static bool sched_tasklet_check_cpu(unsigned int cpu) @@ -1867,29 +1912,39 @@ static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now, if ( prev->next_time >= 0 ) /* -ve means no limit */ set_timer(&sr->s_timer, now + prev->next_time); - if ( likely(prev != next) ) - sched_switch_units(sr, next, prev, now); + sched_switch_units(sr, next, prev, now); return next; } -static void context_saved(struct vcpu *prev) +static void vcpu_context_saved(struct vcpu *vprev, struct vcpu *vnext) { - struct sched_unit *unit = prev->sched_unit; - /* Clear running flag /after/ writing context to memory. */ smp_wmb(); - prev->is_running = 0; + if ( vprev != vnext ) + vprev->is_running = 0; +} + +static void unit_context_saved(struct sched_resource *sr) +{ + struct sched_unit *unit = sr->prev; + + if ( !unit ) + return; + unit->is_running = false; unit->state_entry_time = NOW(); + sr->prev = NULL; /* Check for migration request /after/ clearing running flag. */ smp_mb(); - sched_context_saved(vcpu_scheduler(prev), unit); + sched_context_saved(unit_scheduler(unit), unit); - sched_unit_migrate_finish(unit); + /* Idle never migrates and idle vcpus might belong to other units. */ + if ( !is_idle_unit(unit) ) + sched_unit_migrate_finish(unit); } /* @@ -1899,35 +1954,44 @@ static void context_saved(struct vcpu *prev) * The counter will be 0 in case no rendezvous is needed. For the rendezvous * case it is initialised to the number of cpus to rendezvous plus 1. Each * member entering decrements the counter. The last one will decrement it to - * 1 and perform the final needed action in that case (call of context_saved() - * if vcpu was switched), and then set the counter to zero. The other members + * 1 and perform the final needed action in that case (call of + * unit_context_saved()), and then set the counter to zero. The other members * will wait until the counter becomes zero until they proceed. */ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) { struct sched_unit *next = vnext->sched_unit; + struct sched_resource *sr = get_sched_res(smp_processor_id()); if ( atomic_read(&next->rendezvous_out_cnt) ) { int cnt = atomic_dec_return(&next->rendezvous_out_cnt); - /* Call context_saved() before releasing other waiters. */ + vcpu_context_saved(vprev, vnext); + + /* Call unit_context_saved() before releasing other waiters. */ if ( cnt == 1 ) { - if ( vprev != vnext ) - context_saved(vprev); + unit_context_saved(sr); atomic_set(&next->rendezvous_out_cnt, 0); } else while ( atomic_read(&next->rendezvous_out_cnt) ) cpu_relax(); } - else if ( vprev != vnext && sched_granularity == 1 ) - context_saved(vprev); + else + { + vcpu_context_saved(vprev, vnext); + if ( sched_granularity == 1 ) + unit_context_saved(sr); + } + + if ( is_idle_vcpu(vprev) && vprev != vnext ) + vprev->sched_unit = sr->sched_unit_idle; } static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, - s_time_t now) + bool reset_idle_unit, s_time_t now) { if ( unlikely(vprev == vnext) ) { @@ -1936,6 +2000,17 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, now - vprev->runstate.state_entry_time, vprev->sched_unit->next_time); sched_context_switched(vprev, vnext); + + /* + * We are switching from a non-idle to an idle unit. + * A vcpu of the idle unit might have been running before due to + * the guest vcpu being blocked. We must adjust the unit of the idle + * vcpu which might have been set to the guest's one. + */ + if ( reset_idle_unit ) + vnext->sched_unit = + get_sched_res(smp_processor_id())->sched_unit_idle; + trace_continue_running(vnext); return continue_running(vprev); } @@ -1994,7 +2069,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, pcpu_schedule_unlock_irq(*lock, cpu); raise_softirq(SCHED_SLAVE_SOFTIRQ); - sched_context_switch(vprev, vprev, now); + sched_context_switch(vprev, vprev, false, now); return NULL; /* ARM only. */ } @@ -2037,7 +2112,8 @@ static void sched_slave(void) pcpu_schedule_unlock_irq(lock, cpu); - sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now); + sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), + is_idle_unit(next) && !is_idle_unit(prev), now); } /* @@ -2099,7 +2175,8 @@ static void schedule(void) pcpu_schedule_unlock_irq(lock, cpu); vnext = sched_unit2vcpu_cpu(next, cpu); - sched_context_switch(vprev, vnext, now); + sched_context_switch(vprev, vnext, + !is_idle_unit(prev) && is_idle_unit(next), now); } /* The scheduler timer: force a run through the scheduler */ @@ -2170,6 +2247,7 @@ static int cpu_schedule_up(unsigned int cpu) */ sr->curr = idle_vcpu[cpu]->sched_unit; + sr->sched_unit_idle = idle_vcpu[cpu]->sched_unit; sr->sched_priv = NULL; @@ -2339,6 +2417,7 @@ void __init scheduler_init(void) if ( vcpu_create(idle_domain, 0) == NULL ) BUG(); get_sched_res(0)->curr = idle_vcpu[0]->sched_unit; + get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit; } /* diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h index 1653e89d30..88beb4645a 100644 --- a/xen/include/asm-arm/current.h +++ b/xen/include/asm-arm/current.h @@ -18,6 +18,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu); #define current (this_cpu(curr_vcpu)) #define set_current(vcpu) do { current = (vcpu); } while (0) +#define get_cpu_current(cpu) (per_cpu(curr_vcpu, cpu)) /* Per-VCPU state that lives at the top of the stack */ struct cpu_info { diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h index f3508c3c08..0b47485337 100644 --- a/xen/include/asm-x86/current.h +++ b/xen/include/asm-x86/current.h @@ -77,6 +77,11 @@ struct cpu_info { /* get_stack_bottom() must be 16-byte aligned */ }; +static inline struct cpu_info *get_cpu_info_from_stack(unsigned long sp) +{ + return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1; +} + static inline struct cpu_info *get_cpu_info(void) { #ifdef __clang__ @@ -87,7 +92,7 @@ static inline struct cpu_info *get_cpu_info(void) register unsigned long sp asm("rsp"); #endif - return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1; + return get_cpu_info_from_stack(sp); } #define get_current() (get_cpu_info()->current_vcpu) @@ -124,16 +129,22 @@ unsigned long get_stack_dump_bottom (unsigned long sp); # define CHECK_FOR_LIVEPATCH_WORK "" #endif -#define reset_stack_and_jump(__fn) \ +#define switch_stack_and_jump(fn, instr) \ ({ \ __asm__ __volatile__ ( \ "mov %0,%%"__OP"sp;" \ - CHECK_FOR_LIVEPATCH_WORK \ + instr \ "jmp %c1" \ - : : "r" (guest_cpu_user_regs()), "i" (__fn) : "memory" ); \ + : : "r" (guest_cpu_user_regs()), "i" (fn) : "memory" ); \ unreachable(); \ }) +#define reset_stack_and_jump(fn) \ + switch_stack_and_jump(fn, CHECK_FOR_LIVEPATCH_WORK) + +#define reset_stack_and_jump_nolp(fn) \ + switch_stack_and_jump(fn, "") + /* * Which VCPU's state is currently running on each CPU? * This is not necesasrily the same as 'current' as a CPU may be diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index 61446d0efd..dbeed2fd41 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -77,6 +77,13 @@ void set_nr_sockets(void); /* Representing HT and core siblings in each socket. */ extern cpumask_t **socket_cpumask; +/* + * To be used only while no context switch can occur on the cpu, i.e. + * by certain scheduling code only. + */ +#define get_cpu_current(cpu) \ + (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu) + #endif /* !__ASSEMBLY__ */ #endif diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 1b296b150f..41a1083a08 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -39,6 +39,8 @@ struct sched_resource { spinlock_t *schedule_lock, _lock; struct sched_unit *curr; + struct sched_unit *sched_unit_idle; + struct sched_unit *prev; void *sched_priv; struct timer s_timer; /* scheduling timer */ @@ -194,7 +196,7 @@ static inline void sched_clear_pause_flags_atomic(struct sched_unit *unit, static inline struct sched_unit *sched_idle_unit(unsigned int cpu) { - return idle_vcpu[cpu]->sched_unit; + return get_sched_res(cpu)->sched_unit_idle; } static inline unsigned int sched_get_resource_cpu(unsigned int cpu) diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index 12f00cd78d..ce4329db72 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -929,6 +929,7 @@ void restore_vcpu_affinity(struct domain *d); void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); uint64_t get_cpu_idle_time(unsigned int cpu); +void sched_guest_idle(void (*idle) (void), unsigned int cpu); /* * Used by idle loop to decide whether there is work to do: From patchwork Mon Sep 30 05:21:24 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166033 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0573A16C1 for ; Mon, 30 Sep 2019 05:23:19 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id D42A220815 for ; Mon, 30 Sep 2019 05:23:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org D42A220815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8L-00027h-Mp; Mon, 30 Sep 2019 05:22:05 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8K-00026s-B8 for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:04 +0000 X-Inumbo-ID: 302df988-e342-11e9-8628-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 302df988-e342-11e9-8628-bc764e2007e4; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 114CDB089; Mon, 30 Sep 2019 05:21:41 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:24 +0200 Message-Id: <20190930052135.11257-9-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 08/19] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" vcpu_wake() and vcpu_sleep() need to be made core scheduling aware: they might need to switch a single vcpu of an already scheduled unit between running and not running. Especially when vcpu_sleep() for a vcpu is being called by a vcpu of the same scheduling unit special care must be taken in order to avoid a deadlock: the vcpu to be put asleep must be forced through a context switch without doing so for the calling vcpu. For this purpose add a vcpu flag handled in sched_slave() and in sched_wait_rendezvous_in() allowing a vcpu of the currently running unit to switch state at a higher priority than a normal schedule event. Use the same mechanism when waking up a vcpu of a currently active unit. While at it make vcpu_sleep_nosync_locked() static as it is used in schedule.c only. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- RFC V2: add vcpu_sleep() handling and force_context_switch flag V2: fix runstate change in sched_force_context_switch() V4: - use unit_scheduler() where appropriate (Jan Beulich) - make cpu parameter unsigned int (Jan Beulich) - comments (Jan Beulich) - use true instead 1 for setting bool (Jan Beulich) - const parameter (Jan Beulich) V5: - add comments (Dario Faggioli) --- xen/common/schedule.c | 134 +++++++++++++++++++++++++++++++++++++++++++-- xen/include/xen/sched-if.h | 9 ++- xen/include/xen/sched.h | 2 + 3 files changed, 136 insertions(+), 9 deletions(-) diff --git a/xen/common/schedule.c b/xen/common/schedule.c index b4c4b04ebe..9442be1c83 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -751,8 +751,10 @@ void sched_destroy_domain(struct domain *d) } } -void vcpu_sleep_nosync_locked(struct vcpu *v) +static void vcpu_sleep_nosync_locked(struct vcpu *v) { + struct sched_unit *unit = v->sched_unit; + ASSERT(spin_is_locked(get_sched_res(v->processor)->schedule_lock)); if ( likely(!vcpu_runnable(v)) ) @@ -760,7 +762,15 @@ void vcpu_sleep_nosync_locked(struct vcpu *v) if ( v->runstate.state == RUNSTATE_runnable ) vcpu_runstate_change(v, RUNSTATE_offline, NOW()); - sched_sleep(vcpu_scheduler(v), v->sched_unit); + /* Only put unit to sleep in case all vcpus are not runnable. */ + if ( likely(!unit_runnable(unit)) ) + sched_sleep(unit_scheduler(unit), unit); + else if ( unit_running(unit) > 1 && v->is_running && + !v->force_context_switch ) + { + v->force_context_switch = true; + cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ); + } } } @@ -792,16 +802,27 @@ void vcpu_wake(struct vcpu *v) { unsigned long flags; spinlock_t *lock; + struct sched_unit *unit = v->sched_unit; TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id); - lock = unit_schedule_lock_irqsave(v->sched_unit, &flags); + lock = unit_schedule_lock_irqsave(unit, &flags); if ( likely(vcpu_runnable(v)) ) { if ( v->runstate.state >= RUNSTATE_blocked ) vcpu_runstate_change(v, RUNSTATE_runnable, NOW()); - sched_wake(vcpu_scheduler(v), v->sched_unit); + /* + * Call sched_wake() unconditionally, even if unit is running already. + * We might have not been de-scheduled after vcpu_sleep_nosync_locked() + * and are now to be woken up again. + */ + sched_wake(unit_scheduler(unit), unit); + if ( unit->is_running && !v->is_running && !v->force_context_switch ) + { + v->force_context_switch = true; + cpu_raise_softirq(v->processor, SCHED_SLAVE_SOFTIRQ); + } } else if ( !(v->pause_flags & VPF_blocked) ) { @@ -809,7 +830,7 @@ void vcpu_wake(struct vcpu *v) vcpu_runstate_change(v, RUNSTATE_offline, NOW()); } - unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + unit_schedule_unlock_irqrestore(lock, flags, unit); } void vcpu_unblock(struct vcpu *v) @@ -2027,6 +2048,65 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, context_switch(vprev, vnext); } +/* + * Force a context switch of a single vcpu of an unit. + * Might be called either if a vcpu of an already running unit is woken up + * or if a vcpu of a running unit is put asleep with other vcpus of the same + * unit still running. + * Returns either NULL if v is already in the correct state or the vcpu to + * run next. + */ +static struct vcpu *sched_force_context_switch(struct vcpu *vprev, + struct vcpu *v, + unsigned int cpu, s_time_t now) +{ + v->force_context_switch = false; + + if ( vcpu_runnable(v) == v->is_running ) + return NULL; + + if ( vcpu_runnable(v) ) + { + if ( is_idle_vcpu(vprev) ) + { + vcpu_runstate_change(vprev, RUNSTATE_runnable, now); + vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle; + } + vcpu_runstate_change(v, RUNSTATE_running, now); + } + else + { + /* Make sure not to switch last vcpu of an unit away. */ + if ( unit_running(v->sched_unit) == 1 ) + return NULL; + + v->new_state = vcpu_runstate_blocked(v); + vcpu_runstate_change(v, v->new_state, now); + v = sched_unit2vcpu_cpu(vprev->sched_unit, cpu); + if ( v != vprev ) + { + if ( is_idle_vcpu(vprev) ) + { + vcpu_runstate_change(vprev, RUNSTATE_runnable, now); + vprev->sched_unit = get_sched_res(cpu)->sched_unit_idle; + } + else + { + v->sched_unit = vprev->sched_unit; + vcpu_runstate_change(v, RUNSTATE_running, now); + } + } + } + + /* This vcpu will be switched to. */ + v->is_running = true; + + /* Make sure not to loose another slave call. */ + raise_softirq(SCHED_SLAVE_SOFTIRQ); + + return v; +} + /* * Rendezvous before taking a scheduling decision. * Called with schedule lock held, so all accesses to the rendezvous counter @@ -2042,6 +2122,7 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, s_time_t now) { struct sched_unit *next; + struct vcpu *v; if ( !--prev->rendezvous_in_cnt ) { @@ -2050,8 +2131,28 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, return next; } + v = unit2vcpu_cpu(prev, cpu); while ( prev->rendezvous_in_cnt ) { + if ( v && v->force_context_switch ) + { + struct vcpu *vprev = current; + + v = sched_force_context_switch(vprev, v, cpu, now); + + if ( v ) + { + /* We'll come back another time, so adjust rendezvous_in_cnt. */ + prev->rendezvous_in_cnt++; + atomic_set(&prev->rendezvous_out_cnt, 0); + + pcpu_schedule_unlock_irq(*lock, cpu); + + sched_context_switch(vprev, v, false, now); + } + + v = unit2vcpu_cpu(prev, cpu); + } /* * Coming from idle might need to do tasklet work. * In order to avoid deadlocks we can't do that here, but have to @@ -2086,10 +2187,11 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, static void sched_slave(void) { - struct vcpu *vprev = current; + struct vcpu *v, *vprev = current; struct sched_unit *prev = vprev->sched_unit, *next; s_time_t now; spinlock_t *lock; + bool do_softirq = false; unsigned int cpu = smp_processor_id(); ASSERT_NOT_IN_ATOMIC(); @@ -2098,9 +2200,29 @@ static void sched_slave(void) now = NOW(); + v = unit2vcpu_cpu(prev, cpu); + if ( v && v->force_context_switch ) + { + v = sched_force_context_switch(vprev, v, cpu, now); + + if ( v ) + { + pcpu_schedule_unlock_irq(lock, cpu); + + sched_context_switch(vprev, v, false, now); + } + + do_softirq = true; + } + if ( !prev->rendezvous_in_cnt ) { pcpu_schedule_unlock_irq(lock, cpu); + + /* Check for failed forced context switch. */ + if ( do_softirq ) + raise_softirq(SCHEDULE_SOFTIRQ); + return; } diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 41a1083a08..021c1d7c2c 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -102,6 +102,11 @@ static inline bool unit_runnable(const struct sched_unit *unit) return false; } +static inline int vcpu_runstate_blocked(const struct vcpu *v) +{ + return (v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : RUNSTATE_offline; +} + /* * Returns whether a sched_unit is runnable and sets new_state for each of its * vcpus. It is mandatory to determine the new runstate for all vcpus of a unit @@ -121,9 +126,7 @@ static inline bool unit_runnable_state(const struct sched_unit *unit) { runnable = vcpu_runnable(v); - v->new_state = runnable ? RUNSTATE_running - : (v->pause_flags & VPF_blocked) - ? RUNSTATE_blocked : RUNSTATE_offline; + v->new_state = runnable ? RUNSTATE_running : vcpu_runstate_blocked(v); if ( runnable ) ret = true; diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index ce4329db72..f97303668a 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -186,6 +186,8 @@ struct vcpu bool is_running; /* VCPU should wake fast (do not deep sleep the CPU). */ bool is_urgent; + /* VCPU must context_switch without scheduling unit. */ + bool force_context_switch; #ifdef VCPU_TRAP_LAST #define VCPU_TRAP_NONE 0 From patchwork Mon Sep 30 05:21:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166013 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 83DF416C1 for ; Mon, 30 Sep 2019 05:23:05 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 6A6C620815 for ; Mon, 30 Sep 2019 05:23:05 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6A6C620815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8a-0002Pk-Lm; Mon, 30 Sep 2019 05:22:20 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8Z-0002Nr-BV for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:19 +0000 X-Inumbo-ID: 30c694f4-e342-11e9-8628-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 30c694f4-e342-11e9-8628-bc764e2007e4; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5D44BB090; Mon, 30 Sep 2019 05:21:41 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:25 +0200 Message-Id: <20190930052135.11257-10-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 09/19] xen/sched: move per-cpu variable scheduler to struct sched_resource X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Having a pointer to struct scheduler in struct sched_resource instead of per cpu is enough. Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Dario Faggioli --- V1: new patch V4: - several renames sd -> sr (Jan Beulich) - use ops instead or sr->scheduler (Jan Beulich) --- xen/common/sched_credit.c | 18 +++++++++++------- xen/common/sched_credit2.c | 3 ++- xen/common/schedule.c | 15 +++++++-------- xen/include/xen/sched-if.h | 2 +- 4 files changed, 21 insertions(+), 17 deletions(-) diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index a6dff8ec62..86603adcb6 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -352,9 +352,10 @@ DEFINE_PER_CPU(unsigned int, last_tickle_cpu); static inline void __runq_tickle(struct csched_unit *new) { unsigned int cpu = sched_unit_master(new->unit); + struct sched_resource *sr = get_sched_res(cpu); struct sched_unit *unit = new->unit; struct csched_unit * const cur = CSCHED_UNIT(curr_on_cpu(cpu)); - struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); + struct csched_private *prv = CSCHED_PRIV(sr->scheduler); cpumask_t mask, idle_mask, *online; int balance_step, idlers_empty; @@ -931,7 +932,8 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu) { struct sched_unit *currunit = current->sched_unit; struct csched_unit * const svc = CSCHED_UNIT(currunit); - const struct scheduler *ops = per_cpu(scheduler, cpu); + struct sched_resource *sr = get_sched_res(cpu); + const struct scheduler *ops = sr->scheduler; ASSERT( sched_unit_master(currunit) == cpu ); ASSERT( svc->sdom != NULL ); @@ -987,8 +989,7 @@ csched_unit_acct(struct csched_private *prv, unsigned int cpu) * idlers. But, if we are here, it means there is someone running * on it, and hence the bit must be zero already. */ - ASSERT(!cpumask_test_cpu(cpu, - CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers)); + ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(ops)->idlers)); cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); } } @@ -1083,6 +1084,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit) { struct csched_unit * const svc = CSCHED_UNIT(unit); unsigned int cpu = sched_unit_master(unit); + struct sched_resource *sr = get_sched_res(cpu); SCHED_STAT_CRANK(unit_sleep); @@ -1095,7 +1097,7 @@ csched_unit_sleep(const struct scheduler *ops, struct sched_unit *unit) * But, we are here because unit is going to sleep while running on cpu, * so the bit must be zero already. */ - ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(per_cpu(scheduler, cpu))->idlers)); + ASSERT(!cpumask_test_cpu(cpu, CSCHED_PRIV(sr->scheduler)->idlers)); cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); } else if ( __unit_on_runq(svc) ) @@ -1575,8 +1577,9 @@ static void csched_tick(void *_cpu) { unsigned int cpu = (unsigned long)_cpu; + struct sched_resource *sr = get_sched_res(cpu); struct csched_pcpu *spc = CSCHED_PCPU(cpu); - struct csched_private *prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); + struct csched_private *prv = CSCHED_PRIV(sr->scheduler); spc->tick++; @@ -1601,7 +1604,8 @@ csched_tick(void *_cpu) static struct csched_unit * csched_runq_steal(int peer_cpu, int cpu, int pri, int balance_step) { - const struct csched_private * const prv = CSCHED_PRIV(per_cpu(scheduler, cpu)); + struct sched_resource *sr = get_sched_res(cpu); + const struct csched_private * const prv = CSCHED_PRIV(sr->scheduler); const struct csched_pcpu * const peer_pcpu = CSCHED_PCPU(peer_cpu); struct csched_unit *speer; struct list_head *iter; diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index d51df05887..af58ee161d 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -3268,8 +3268,9 @@ runq_candidate(struct csched2_runqueue_data *rqd, unsigned int *skipped) { struct list_head *iter, *temp; + struct sched_resource *sr = get_sched_res(cpu); struct csched2_unit *snext = NULL; - struct csched2_private *prv = csched2_priv(per_cpu(scheduler, cpu)); + struct csched2_private *prv = csched2_priv(sr->scheduler); bool yield = false, soft_aff_preempt = false; *skipped = 0; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 9442be1c83..5e9cee1f82 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -75,7 +75,6 @@ static void vcpu_singleshot_timer_fn(void *data); static void poll_timer_fn(void *data); /* This is global for now so that private implementations can reach it */ -DEFINE_PER_CPU(struct scheduler *, scheduler); DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res); static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx); @@ -200,7 +199,7 @@ static inline struct scheduler *unit_scheduler(const struct sched_unit *unit) */ ASSERT(is_idle_domain(d)); - return per_cpu(scheduler, unit->res->master_cpu); + return unit->res->scheduler; } static inline struct scheduler *vcpu_scheduler(const struct vcpu *v) @@ -1921,8 +1920,8 @@ static bool sched_tasklet_check(unsigned int cpu) static struct sched_unit *do_schedule(struct sched_unit *prev, s_time_t now, unsigned int cpu) { - struct scheduler *sched = per_cpu(scheduler, cpu); struct sched_resource *sr = get_sched_res(cpu); + struct scheduler *sched = sr->scheduler; struct sched_unit *next; /* get policy-specific decision on scheduling... */ @@ -2342,7 +2341,7 @@ static int cpu_schedule_up(unsigned int cpu) sr->cpus = cpumask_of(cpu); set_sched_res(cpu, sr); - per_cpu(scheduler, cpu) = &sched_idle_ops; + sr->scheduler = &sched_idle_ops; spin_lock_init(&sr->_lock); sr->schedule_lock = &sched_free_cpu_lock; init_timer(&sr->s_timer, s_timer_fn, NULL, cpu); @@ -2553,7 +2552,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) { struct vcpu *idle; void *ppriv, *ppriv_old, *vpriv, *vpriv_old; - struct scheduler *old_ops = per_cpu(scheduler, cpu); + struct scheduler *old_ops = get_sched_res(cpu)->scheduler; struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched; struct cpupool *old_pool = per_cpu(cpupool, cpu); struct sched_resource *sd = get_sched_res(cpu); @@ -2617,7 +2616,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) ppriv_old = sd->sched_priv; new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv); - per_cpu(scheduler, cpu) = new_ops; + sd->scheduler = new_ops; sd->sched_priv = ppriv; /* @@ -2717,7 +2716,7 @@ void sched_tick_suspend(void) struct scheduler *sched; unsigned int cpu = smp_processor_id(); - sched = per_cpu(scheduler, cpu); + sched = get_sched_res(cpu)->scheduler; sched_do_tick_suspend(sched, cpu); rcu_idle_enter(cpu); rcu_idle_timer_start(); @@ -2730,7 +2729,7 @@ void sched_tick_resume(void) rcu_idle_timer_stop(); rcu_idle_exit(cpu); - sched = per_cpu(scheduler, cpu); + sched = get_sched_res(cpu)->scheduler; sched_do_tick_resume(sched, cpu); } diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 021c1d7c2c..01821b3e5b 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -36,6 +36,7 @@ extern const cpumask_t *sched_res_mask; * as the rest of the struct. Just have the scheduler point to the * one it wants (This may be the one right in front of it).*/ struct sched_resource { + struct scheduler *scheduler; spinlock_t *schedule_lock, _lock; struct sched_unit *curr; @@ -49,7 +50,6 @@ struct sched_resource { const cpumask_t *cpus; /* cpus covered by this struct */ }; -DECLARE_PER_CPU(struct scheduler *, scheduler); DECLARE_PER_CPU(struct cpupool *, cpupool); DECLARE_PER_CPU(struct sched_resource *, sched_res); From patchwork Mon Sep 30 05:21:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166031 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C1B8A13B1 for ; Mon, 30 Sep 2019 05:23:16 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A844920815 for ; Mon, 30 Sep 2019 05:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A844920815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8Q-0002Dt-OW; Mon, 30 Sep 2019 05:22:10 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8P-0002Bz-BJ for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:09 +0000 X-Inumbo-ID: 30c96e0e-e342-11e9-8628-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 30c96e0e-e342-11e9-8628-bc764e2007e4; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id B0E86B0B7; Mon, 30 Sep 2019 05:21:41 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:26 +0200 Message-Id: <20190930052135.11257-11-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 10/19] xen/sched: move per-cpu variable cpupool to struct sched_resource X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Meng Xu , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Having a pointer to struct cpupool in struct sched_resource instead of per cpu is enough. Signed-off-by: Juergen Gross Reviewed-by: Jan Beulich Reviewed-by: Dario Faggioli --- V1: new patch --- xen/common/cpupool.c | 4 +--- xen/common/sched_credit.c | 2 +- xen/common/sched_rt.c | 2 +- xen/common/schedule.c | 8 ++++---- xen/include/xen/sched-if.h | 2 +- 5 files changed, 8 insertions(+), 10 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 441a26f16c..60a85f50e1 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -34,8 +34,6 @@ static cpumask_t cpupool_locked_cpus; static DEFINE_SPINLOCK(cpupool_lock); -DEFINE_PER_CPU(struct cpupool *, cpupool); - static void free_cpupool_struct(struct cpupool *c) { if ( c ) @@ -504,7 +502,7 @@ static int cpupool_cpu_add(unsigned int cpu) * (or unplugging would have failed) and that is the default behavior * anyway. */ - per_cpu(cpupool, cpu) = NULL; + get_sched_res(cpu)->cpupool = NULL; ret = cpupool_assign_cpu_locked(cpupool0, cpu); spin_unlock(&cpupool_lock); diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c index 86603adcb6..31fdcd6a2f 100644 --- a/xen/common/sched_credit.c +++ b/xen/common/sched_credit.c @@ -1681,7 +1681,7 @@ static struct csched_unit * csched_load_balance(struct csched_private *prv, int cpu, struct csched_unit *snext, bool *stolen) { - struct cpupool *c = per_cpu(cpupool, cpu); + struct cpupool *c = get_sched_res(cpu)->cpupool; struct csched_unit *speer; cpumask_t workers; cpumask_t *online; diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c index d21c416cae..6e93e50acb 100644 --- a/xen/common/sched_rt.c +++ b/xen/common/sched_rt.c @@ -774,7 +774,7 @@ rt_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) if ( prv->repl_timer.cpu == cpu ) { - struct cpupool *c = per_cpu(cpupool, cpu); + struct cpupool *c = get_sched_res(cpu)->cpupool; unsigned int new_cpu = cpumask_cycle(cpu, cpupool_online_cpumask(c)); /* diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 5e9cee1f82..249ff8a882 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -1120,7 +1120,7 @@ int cpu_disable_scheduler(unsigned int cpu) cpumask_t online_affinity; int ret = 0; - c = per_cpu(cpupool, cpu); + c = get_sched_res(cpu)->cpupool; if ( c == NULL ) return ret; @@ -1189,7 +1189,7 @@ static int cpu_disable_scheduler_check(unsigned int cpu) struct vcpu *v; struct cpupool *c; - c = per_cpu(cpupool, cpu); + c = get_sched_res(cpu)->cpupool; if ( c == NULL ) return 0; @@ -2554,8 +2554,8 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) void *ppriv, *ppriv_old, *vpriv, *vpriv_old; struct scheduler *old_ops = get_sched_res(cpu)->scheduler; struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched; - struct cpupool *old_pool = per_cpu(cpupool, cpu); struct sched_resource *sd = get_sched_res(cpu); + struct cpupool *old_pool = sd->cpupool; spinlock_t *old_lock, *new_lock; unsigned long flags; @@ -2637,7 +2637,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) sched_free_udata(old_ops, vpriv_old); sched_free_pdata(old_ops, ppriv_old, cpu); - per_cpu(cpupool, cpu) = c; + get_sched_res(cpu)->cpupool = c; /* When a cpu is added to a pool, trigger it to go pick up some work */ if ( c != NULL ) cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 01821b3e5b..e675061290 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -37,6 +37,7 @@ extern const cpumask_t *sched_res_mask; * one it wants (This may be the one right in front of it).*/ struct sched_resource { struct scheduler *scheduler; + struct cpupool *cpupool; spinlock_t *schedule_lock, _lock; struct sched_unit *curr; @@ -50,7 +51,6 @@ struct sched_resource { const cpumask_t *cpus; /* cpus covered by this struct */ }; -DECLARE_PER_CPU(struct cpupool *, cpupool); DECLARE_PER_CPU(struct sched_resource *, sched_res); static inline struct sched_resource *get_sched_res(unsigned int cpu) From patchwork Mon Sep 30 05:21:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166017 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 267AC13B1 for ; Mon, 30 Sep 2019 05:23:07 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 0C89520815 for ; Mon, 30 Sep 2019 05:23:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C89520815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8E-00021b-Cj; Mon, 30 Sep 2019 05:21:58 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8D-00020w-IR for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:21:57 +0000 X-Inumbo-ID: 3121f678-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 3121f678-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 0C9A3B0CC; Mon, 30 Sep 2019 05:21:42 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:27 +0200 Message-Id: <20190930052135.11257-12-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 11/19] xen/sched: reject switching smt on/off with core scheduling active X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , =?utf-8?q?Roger_Pau_Monn=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" When core or socket scheduling are active enabling or disabling smt is not possible as that would require a major host reconfiguration. Add a bool sched_disable_smt_switching which will be set for core or socket scheduling. Signed-off-by: Juergen Gross Acked-by: Jan Beulich Acked-by: Dario Faggioli --- V1: - new patch V2: - EBUSY as return code (Jan Beulich, Dario Faggioli) - __read_mostly for sched_disable_smt_switching (Jan Beulich) --- xen/arch/x86/sysctl.c | 5 +++++ xen/common/schedule.c | 1 + xen/include/xen/sched.h | 1 + 3 files changed, 7 insertions(+) diff --git a/xen/arch/x86/sysctl.c b/xen/arch/x86/sysctl.c index 3742ede61b..4a76f0f47f 100644 --- a/xen/arch/x86/sysctl.c +++ b/xen/arch/x86/sysctl.c @@ -209,6 +209,11 @@ long arch_do_sysctl( ret = -EOPNOTSUPP; break; } + if ( sched_disable_smt_switching ) + { + ret = -EBUSY; + break; + } plug = op == XEN_SYSCTL_CPU_HOTPLUG_SMT_ENABLE; fn = smt_up_down_helper; hcpu = _p(plug); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 249ff8a882..0dcf004d78 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -63,6 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Number of vcpus per struct sched_unit. */ static unsigned int __read_mostly sched_granularity = 1; +bool __read_mostly sched_disable_smt_switching; const cpumask_t *sched_res_mask = &cpumask_all; /* Common lock for free cpus. */ diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index f97303668a..aa8257edc9 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -1037,6 +1037,7 @@ static inline bool is_iommu_enabled(const struct domain *d) } extern bool sched_smt_power_savings; +extern bool sched_disable_smt_switching; extern enum cpufreq_controller { FREQCTL_none, FREQCTL_dom0_kernel, FREQCTL_xen From patchwork Mon Sep 30 05:21:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166019 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 3AC7B16C1 for ; Mon, 30 Sep 2019 05:23:10 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 2063C20815 for ; Mon, 30 Sep 2019 05:23:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2063C20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8f-0002WB-HP; Mon, 30 Sep 2019 05:22:25 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8e-0002UQ-DL for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:24 +0000 X-Inumbo-ID: 31227d78-e342-11e9-b588-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31227d78-e342-11e9-b588-bc764e2007e4; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 5C1DCB0E2; Mon, 30 Sep 2019 05:21:42 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:28 +0200 Message-Id: <20190930052135.11257-13-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 12/19] xen/sched: prepare per-cpupool scheduling granularity X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" On- and offlining cpus with core scheduling is rather complicated as the cpus are taken on- or offline one by one, but scheduling wants them rather to be handled per core. As the future plan is to be able to select scheduling granularity per cpupool prepare that by storing the granularity in struct sched_resource (we need it there for free cpus which are not associated to any cpupool). Free cpus will always use granularity 1. Store the selected granularity option (cpu, core or socket) in the cpupool , as we will need it to select the appropriate cpu mask when populating the cpupool with cpus. This will make on- and offlining of cpus much easier and avoids writing code which would needed to be thrown away later. Move the granularity related variables to cpupool.c as they are now used form there only. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V1: new patch V4: - move opt_sched_granularity and sched_granularity to cpupool.c (Jan Beulich) - rename c->opt_sched_granularity, drop c->granularity (Jan Beulich) --- xen/common/cpupool.c | 9 +++++++++ xen/common/schedule.c | 27 ++++++++++++++++----------- xen/include/xen/sched-if.h | 11 +++++++++++ 3 files changed, 36 insertions(+), 11 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 60a85f50e1..51f0ff0d88 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -34,6 +34,14 @@ static cpumask_t cpupool_locked_cpus; static DEFINE_SPINLOCK(cpupool_lock); +static enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu; +static unsigned int __read_mostly sched_granularity = 1; + +unsigned int cpupool_get_granularity(const struct cpupool *c) +{ + return c ? sched_granularity : 1; +} + static void free_cpupool_struct(struct cpupool *c) { if ( c ) @@ -173,6 +181,7 @@ static struct cpupool *cpupool_create( return NULL; } } + c->gran = opt_sched_granularity; *q = c; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 0dcf004d78..5257225050 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -62,7 +62,6 @@ int sched_ratelimit_us = SCHED_DEFAULT_RATELIMIT_US; integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Number of vcpus per struct sched_unit. */ -static unsigned int __read_mostly sched_granularity = 1; bool __read_mostly sched_disable_smt_switching; const cpumask_t *sched_res_mask = &cpumask_all; @@ -435,10 +434,10 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) { struct sched_unit *unit, **prev_unit; struct domain *d = v->domain; + unsigned int gran = cpupool_get_granularity(d->cpupool); for_each_sched_unit ( d, unit ) - if ( unit->unit_id / sched_granularity == - v->vcpu_id / sched_granularity ) + if ( unit->unit_id / gran == v->vcpu_id / gran ) break; if ( unit ) @@ -593,6 +592,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) void *unitdata; struct scheduler *old_ops; void *old_domdata; + unsigned int gran = cpupool_get_granularity(c); for_each_vcpu ( d, v ) { @@ -604,8 +604,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) if ( IS_ERR(domdata) ) return PTR_ERR(domdata); - unit_priv = xzalloc_array(void *, - DIV_ROUND_UP(d->max_vcpus, sched_granularity)); + unit_priv = xzalloc_array(void *, DIV_ROUND_UP(d->max_vcpus, gran)); if ( unit_priv == NULL ) { sched_free_domdata(c->sched, domdata); @@ -1850,11 +1849,11 @@ static void sched_switch_units(struct sched_resource *sr, if ( is_idle_unit(prev) ) { prev->runstate_cnt[RUNSTATE_running] = 0; - prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity; + prev->runstate_cnt[RUNSTATE_runnable] = sr->granularity; } if ( is_idle_unit(next) ) { - next->runstate_cnt[RUNSTATE_running] = sched_granularity; + next->runstate_cnt[RUNSTATE_running] = sr->granularity; next->runstate_cnt[RUNSTATE_runnable] = 0; } } @@ -2003,7 +2002,7 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) else { vcpu_context_saved(vprev, vnext); - if ( sched_granularity == 1 ) + if ( sr->granularity == 1 ) unit_context_saved(sr); } @@ -2123,11 +2122,12 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, { struct sched_unit *next; struct vcpu *v; + unsigned int gran = get_sched_res(cpu)->granularity; if ( !--prev->rendezvous_in_cnt ) { next = do_schedule(prev, now, cpu); - atomic_set(&next->rendezvous_out_cnt, sched_granularity + 1); + atomic_set(&next->rendezvous_out_cnt, gran + 1); return next; } @@ -2251,6 +2251,7 @@ static void schedule(void) struct sched_resource *sr; spinlock_t *lock; int cpu = smp_processor_id(); + unsigned int gran = get_sched_res(cpu)->granularity; ASSERT_NOT_IN_ATOMIC(); @@ -2276,11 +2277,11 @@ static void schedule(void) now = NOW(); - if ( sched_granularity > 1 ) + if ( gran > 1 ) { cpumask_t mask; - prev->rendezvous_in_cnt = sched_granularity; + prev->rendezvous_in_cnt = gran; cpumask_andnot(&mask, sr->cpus, cpumask_of(cpu)); cpumask_raise_softirq(&mask, SCHED_SLAVE_SOFTIRQ); next = sched_wait_rendezvous_in(prev, &lock, cpu, now); @@ -2348,6 +2349,9 @@ static int cpu_schedule_up(unsigned int cpu) init_timer(&sr->s_timer, s_timer_fn, NULL, cpu); atomic_set(&per_cpu(sched_urgent_count, cpu), 0); + /* We start with cpu granularity. */ + sr->granularity = 1; + /* Boot CPU is dealt with later in scheduler_init(). */ if ( cpu == 0 ) return 0; @@ -2638,6 +2642,7 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) sched_free_udata(old_ops, vpriv_old); sched_free_pdata(old_ops, ppriv_old, cpu); + get_sched_res(cpu)->granularity = cpupool_get_granularity(c); get_sched_res(cpu)->cpupool = c; /* When a cpu is added to a pool, trigger it to go pick up some work */ if ( c != NULL ) diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index e675061290..f8f0f484cb 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -25,6 +25,13 @@ extern int sched_ratelimit_us; /* Scheduling resource mask. */ extern const cpumask_t *sched_res_mask; +/* Number of vcpus per struct sched_unit. */ +enum sched_gran { + SCHED_GRAN_cpu, + SCHED_GRAN_core, + SCHED_GRAN_socket +}; + /* * In order to allow a scheduler to remap the lock->cpu mapping, * we have a per-cpu pointer, along with a pre-allocated set of @@ -48,6 +55,7 @@ struct sched_resource { /* Cpu with lowest id in scheduling resource. */ unsigned int master_cpu; + unsigned int granularity; const cpumask_t *cpus; /* cpus covered by this struct */ }; @@ -546,6 +554,7 @@ struct cpupool struct cpupool *next; struct scheduler *sched; atomic_t refcnt; + enum sched_gran gran; }; #define cpupool_online_cpumask(_pool) \ @@ -561,6 +570,8 @@ static inline cpumask_t *cpupool_domain_master_cpumask(const struct domain *d) return d->cpupool->res_valid; } +unsigned int cpupool_get_granularity(const struct cpupool *c); + /* * Hard and soft affinity load balancing. * From patchwork Mon Sep 30 05:21:29 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166029 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A213916C1 for ; Mon, 30 Sep 2019 05:23:16 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 7D40820815 for ; Mon, 30 Sep 2019 05:23:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7D40820815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8J-00025t-Az; Mon, 30 Sep 2019 05:22:03 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8I-00025I-GT for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:02 +0000 X-Inumbo-ID: 3171ab00-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 3171ab00-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:45 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id A3FDEB0F2; Mon, 30 Sep 2019 05:21:42 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:29 +0200 Message-Id: <20190930052135.11257-14-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 13/19] xen/sched: split schedule_cpu_switch() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Instead of letting schedule_cpu_switch() handle moving cpus from and to cpupools, split it into schedule_cpu_add() and schedule_cpu_rm(). This will allow us to drop allocating/freeing scheduler data for free cpus as the idle scheduler doesn't need such data. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V1: new patch V4: - rename sd -> sr (Jan Beulich) --- xen/common/cpupool.c | 4 +- xen/common/schedule.c | 133 +++++++++++++++++++++++++++--------------------- xen/include/xen/sched.h | 3 +- 3 files changed, 78 insertions(+), 62 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 51f0ff0d88..02825e779d 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -271,7 +271,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) ) return -EADDRNOTAVAIL; - ret = schedule_cpu_switch(cpu, c); + ret = schedule_cpu_add(cpu, c); if ( ret ) return ret; @@ -321,7 +321,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c) */ if ( !ret ) { - ret = schedule_cpu_switch(cpu, NULL); + ret = schedule_cpu_rm(cpu); if ( ret ) cpumask_clear_cpu(cpu, &cpupool_free_cpus); else diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 5257225050..a96fc82282 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -93,15 +93,6 @@ static struct scheduler __read_mostly ops; static void sched_set_affinity( struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft); -static spinlock_t * -sched_idle_switch_sched(struct scheduler *new_ops, unsigned int cpu, - void *pdata, void *vdata) -{ - sched_idle_unit(cpu)->priv = NULL; - - return &sched_free_cpu_lock; -} - static struct sched_resource * sched_idle_res_pick(const struct scheduler *ops, const struct sched_unit *unit) { @@ -141,7 +132,6 @@ static struct scheduler sched_idle_ops = { .alloc_udata = sched_idle_alloc_udata, .free_udata = sched_idle_free_udata, - .switch_sched = sched_idle_switch_sched, }; static inline struct vcpu *unit2vcpu_cpu(const struct sched_unit *unit, @@ -2547,36 +2537,22 @@ void __init scheduler_init(void) } /* - * Move a pCPU outside of the influence of the scheduler of its current - * cpupool, or subject it to the scheduler of a new cpupool. - * - * For the pCPUs that are removed from their cpupool, their scheduler becomes - * &sched_idle_ops (the idle scheduler). + * Move a pCPU from free cpus (running the idle scheduler) to a cpupool + * using any "real" scheduler. + * The cpu is still marked as "free" and not yet valid for its cpupool. */ -int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) +int schedule_cpu_add(unsigned int cpu, struct cpupool *c) { struct vcpu *idle; - void *ppriv, *ppriv_old, *vpriv, *vpriv_old; - struct scheduler *old_ops = get_sched_res(cpu)->scheduler; - struct scheduler *new_ops = (c == NULL) ? &sched_idle_ops : c->sched; - struct sched_resource *sd = get_sched_res(cpu); - struct cpupool *old_pool = sd->cpupool; + void *ppriv, *vpriv; + struct scheduler *new_ops = c->sched; + struct sched_resource *sr = get_sched_res(cpu); spinlock_t *old_lock, *new_lock; unsigned long flags; - /* - * pCPUs only move from a valid cpupool to free (i.e., out of any pool), - * or from free to a valid cpupool. In the former case (which happens when - * c is NULL), we want the CPU to have been marked as free already, as - * well as to not be valid for the source pool any longer, when we get to - * here. In the latter case (which happens when c is a valid cpupool), we - * want the CPU to still be marked as free, as well as to not yet be valid - * for the destination pool. - */ - ASSERT(c != old_pool && (c != NULL || old_pool != NULL)); ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); - ASSERT((c == NULL && !cpumask_test_cpu(cpu, old_pool->cpu_valid)) || - (c != NULL && !cpumask_test_cpu(cpu, c->cpu_valid))); + ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid)); + ASSERT(get_sched_res(cpu)->cpupool == NULL); /* * To setup the cpu for the new scheduler we need: @@ -2601,52 +2577,91 @@ int schedule_cpu_switch(unsigned int cpu, struct cpupool *c) return -ENOMEM; } - sched_do_tick_suspend(old_ops, cpu); - /* - * The actual switch, including (if necessary) the rerouting of the - * scheduler lock to whatever new_ops prefers, needs to happen in one - * critical section, protected by old_ops' lock, or races are possible. - * It is, in fact, the lock of another scheduler that we are taking (the - * scheduler of the cpupool that cpu still belongs to). But that is ok - * as, anyone trying to schedule on this cpu will spin until when we - * release that lock (bottom of this function). When he'll get the lock - * --thanks to the loop inside *_schedule_lock() functions-- he'll notice - * that the lock itself changed, and retry acquiring the new one (which - * will be the correct, remapped one, at that point). + * The actual switch, including the rerouting of the scheduler lock to + * whatever new_ops prefers, needs to happen in one critical section, + * protected by old_ops' lock, or races are possible. + * It is, in fact, the lock of the idle scheduler that we are taking. + * But that is ok as anyone trying to schedule on this cpu will spin until + * when we release that lock (bottom of this function). When he'll get the + * lock --thanks to the loop inside *_schedule_lock() functions-- he'll + * notice that the lock itself changed, and retry acquiring the new one + * (which will be the correct, remapped one, at that point). */ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); - vpriv_old = idle->sched_unit->priv; - ppriv_old = sd->sched_priv; new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv); - sd->scheduler = new_ops; - sd->sched_priv = ppriv; + sr->scheduler = new_ops; + sr->sched_priv = ppriv; /* - * The data above is protected under new_lock, which may be unlocked. - * Another CPU can take new_lock as soon as sd->schedule_lock is visible, - * and must observe all prior initialisation. + * Reroute the lock to the per pCPU lock as /last/ thing. In fact, + * if it is free (and it can be) we want that anyone that manages + * taking it, finds all the initializations we've done above in place. */ smp_wmb(); - sd->schedule_lock = new_lock; + sr->schedule_lock = new_lock; - /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */ + /* _Not_ pcpu_schedule_unlock(): schedule_lock has changed! */ spin_unlock_irqrestore(old_lock, flags); sched_do_tick_resume(new_ops, cpu); + sr->granularity = cpupool_get_granularity(c); + sr->cpupool = c; + /* The cpu is added to a pool, trigger it to go pick up some work */ + cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); + + return 0; +} + +/* + * Remove a pCPU from its cpupool. Its scheduler becomes &sched_idle_ops + * (the idle scheduler). + * The cpu is already marked as "free" and not valid any longer for its + * cpupool. + */ +int schedule_cpu_rm(unsigned int cpu) +{ + struct vcpu *idle; + void *ppriv_old, *vpriv_old; + struct sched_resource *sr = get_sched_res(cpu); + struct scheduler *old_ops = sr->scheduler; + spinlock_t *old_lock; + unsigned long flags; + + ASSERT(sr->cpupool != NULL); + ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); + ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid)); + + idle = idle_vcpu[cpu]; + + sched_do_tick_suspend(old_ops, cpu); + + /* See comment in schedule_cpu_add() regarding lock switching. */ + old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); + + vpriv_old = idle->sched_unit->priv; + ppriv_old = sr->sched_priv; + + idle->sched_unit->priv = NULL; + sr->scheduler = &sched_idle_ops; + sr->sched_priv = NULL; + + smp_mb(); + sr->schedule_lock = &sched_free_cpu_lock; + + /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */ + spin_unlock_irqrestore(old_lock, flags); + sched_deinit_pdata(old_ops, ppriv_old, cpu); sched_free_udata(old_ops, vpriv_old); sched_free_pdata(old_ops, ppriv_old, cpu); - get_sched_res(cpu)->granularity = cpupool_get_granularity(c); - get_sched_res(cpu)->cpupool = c; - /* When a cpu is added to a pool, trigger it to go pick up some work */ - if ( c != NULL ) - cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); + sr->granularity = 1; + sr->cpupool = NULL; return 0; } diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index aa8257edc9..a40bd5fb56 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -920,7 +920,8 @@ struct scheduler; struct scheduler *scheduler_get_default(void); struct scheduler *scheduler_alloc(unsigned int sched_id, int *perr); void scheduler_free(struct scheduler *sched); -int schedule_cpu_switch(unsigned int cpu, struct cpupool *c); +int schedule_cpu_add(unsigned int cpu, struct cpupool *c); +int schedule_cpu_rm(unsigned int cpu); void vcpu_set_periodic_timer(struct vcpu *v, s_time_t value); int cpu_disable_scheduler(unsigned int cpu); void sched_setup_dom0_vcpus(struct domain *d); From patchwork Mon Sep 30 05:21:30 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166037 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id B825113B1 for ; Mon, 30 Sep 2019 05:23:23 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 90FE420815 for ; Mon, 30 Sep 2019 05:23:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 90FE420815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8p-0002jc-Kk; Mon, 30 Sep 2019 05:22:35 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8o-0002i7-CY for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:34 +0000 X-Inumbo-ID: 3171eea8-e342-11e9-bf31-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 3171eea8-e342-11e9-bf31-bc764e2007e4; Mon, 30 Sep 2019 05:21:45 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 02560B112; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:30 +0200 Message-Id: <20190930052135.11257-15-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 14/19] xen/sched: protect scheduling resource via rcu X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" In order to be able to move cpus to cpupools with core scheduling active it is mandatory to merge multiple cpus into one scheduling resource or to split a scheduling resource with multiple cpus in it into multiple scheduling resources. This in turn requires to modify the cpu <-> scheduling resource relation. In order to be able to free unused resources protect struct sched_resource via RCU. This ensures there are no users left when freeing such a resource. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V1: new patch --- xen/common/cpupool.c | 4 + xen/common/schedule.c | 187 ++++++++++++++++++++++++++++++++++++++++----- xen/include/xen/sched-if.h | 7 +- 3 files changed, 178 insertions(+), 20 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 02825e779d..7228ca84b4 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -511,8 +511,10 @@ static int cpupool_cpu_add(unsigned int cpu) * (or unplugging would have failed) and that is the default behavior * anyway. */ + rcu_read_lock(&sched_res_rculock); get_sched_res(cpu)->cpupool = NULL; ret = cpupool_assign_cpu_locked(cpupool0, cpu); + rcu_read_unlock(&sched_res_rculock); spin_unlock(&cpupool_lock); @@ -597,7 +599,9 @@ static void cpupool_cpu_remove_forced(unsigned int cpu) } } + rcu_read_lock(&sched_res_rculock); sched_rm_cpu(cpu); + rcu_read_unlock(&sched_res_rculock); } /* diff --git a/xen/common/schedule.c b/xen/common/schedule.c index a96fc82282..1f23bf0e83 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -77,6 +77,7 @@ static void poll_timer_fn(void *data); /* This is global for now so that private implementations can reach it */ DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res); static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx); +DEFINE_RCU_READ_LOCK(sched_res_rculock); /* Scratch space for cpumasks. */ DEFINE_PER_CPU(cpumask_t, cpumask_scratch); @@ -300,10 +301,12 @@ void sched_guest_idle(void (*idle) (void), unsigned int cpu) void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) { - spinlock_t *lock = likely(v == current) - ? NULL : unit_schedule_lock_irq(v->sched_unit); + spinlock_t *lock; s_time_t delta; + rcu_read_lock(&sched_res_rculock); + + lock = likely(v == current) ? NULL : unit_schedule_lock_irq(v->sched_unit); memcpy(runstate, &v->runstate, sizeof(*runstate)); delta = NOW() - runstate->state_entry_time; if ( delta > 0 ) @@ -311,6 +314,8 @@ void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) if ( unlikely(lock != NULL) ) unit_schedule_unlock_irq(lock, v->sched_unit); + + rcu_read_unlock(&sched_res_rculock); } uint64_t get_cpu_idle_time(unsigned int cpu) @@ -522,6 +527,8 @@ int sched_init_vcpu(struct vcpu *v) return 0; } + rcu_read_lock(&sched_res_rculock); + /* The first vcpu of an unit can be set via sched_set_res(). */ sched_set_res(unit, get_sched_res(processor)); @@ -529,6 +536,7 @@ int sched_init_vcpu(struct vcpu *v) if ( unit->priv == NULL ) { sched_free_unit(unit, v); + rcu_read_unlock(&sched_res_rculock); return 1; } @@ -555,6 +563,8 @@ int sched_init_vcpu(struct vcpu *v) sched_insert_unit(dom_scheduler(d), unit); } + rcu_read_unlock(&sched_res_rculock); + return 0; } @@ -583,6 +593,7 @@ int sched_move_domain(struct domain *d, struct cpupool *c) struct scheduler *old_ops; void *old_domdata; unsigned int gran = cpupool_get_granularity(c); + int ret = 0; for_each_vcpu ( d, v ) { @@ -590,15 +601,21 @@ int sched_move_domain(struct domain *d, struct cpupool *c) return -EBUSY; } + rcu_read_lock(&sched_res_rculock); + domdata = sched_alloc_domdata(c->sched, d); if ( IS_ERR(domdata) ) - return PTR_ERR(domdata); + { + ret = PTR_ERR(domdata); + goto out; + } unit_priv = xzalloc_array(void *, DIV_ROUND_UP(d->max_vcpus, gran)); if ( unit_priv == NULL ) { sched_free_domdata(c->sched, domdata); - return -ENOMEM; + ret = -ENOMEM; + goto out; } unit_idx = 0; @@ -611,7 +628,8 @@ int sched_move_domain(struct domain *d, struct cpupool *c) sched_free_udata(c->sched, unit_priv[unit_idx]); xfree(unit_priv); sched_free_domdata(c->sched, domdata); - return -ENOMEM; + ret = -ENOMEM; + goto out; } unit_idx++; } @@ -677,7 +695,10 @@ int sched_move_domain(struct domain *d, struct cpupool *c) xfree(unit_priv); - return 0; +out: + rcu_read_unlock(&sched_res_rculock); + + return ret; } void sched_destroy_vcpu(struct vcpu *v) @@ -695,9 +716,13 @@ void sched_destroy_vcpu(struct vcpu *v) */ if ( unit->vcpu_list == v ) { + rcu_read_lock(&sched_res_rculock); + sched_remove_unit(vcpu_scheduler(v), unit); sched_free_udata(vcpu_scheduler(v), unit->priv); sched_free_unit(unit, v); + + rcu_read_unlock(&sched_res_rculock); } } @@ -715,7 +740,12 @@ int sched_init_domain(struct domain *d, int poolid) SCHED_STAT_CRANK(dom_init); TRACE_1D(TRC_SCHED_DOM_ADD, d->domain_id); + rcu_read_lock(&sched_res_rculock); + sdom = sched_alloc_domdata(dom_scheduler(d), d); + + rcu_read_unlock(&sched_res_rculock); + if ( IS_ERR(sdom) ) return PTR_ERR(sdom); @@ -733,9 +763,13 @@ void sched_destroy_domain(struct domain *d) SCHED_STAT_CRANK(dom_destroy); TRACE_1D(TRC_SCHED_DOM_REM, d->domain_id); + rcu_read_lock(&sched_res_rculock); + sched_free_domdata(dom_scheduler(d), d->sched_priv); d->sched_priv = NULL; + rcu_read_unlock(&sched_res_rculock); + cpupool_rm_domain(d); } } @@ -770,11 +804,15 @@ void vcpu_sleep_nosync(struct vcpu *v) TRACE_2D(TRC_SCHED_SLEEP, v->domain->domain_id, v->vcpu_id); + rcu_read_lock(&sched_res_rculock); + lock = unit_schedule_lock_irqsave(v->sched_unit, &flags); vcpu_sleep_nosync_locked(v); unit_schedule_unlock_irqrestore(lock, flags, v->sched_unit); + + rcu_read_unlock(&sched_res_rculock); } void vcpu_sleep_sync(struct vcpu *v) @@ -795,6 +833,8 @@ void vcpu_wake(struct vcpu *v) TRACE_2D(TRC_SCHED_WAKE, v->domain->domain_id, v->vcpu_id); + rcu_read_lock(&sched_res_rculock); + lock = unit_schedule_lock_irqsave(unit, &flags); if ( likely(vcpu_runnable(v)) ) @@ -820,6 +860,8 @@ void vcpu_wake(struct vcpu *v) } unit_schedule_unlock_irqrestore(lock, flags, unit); + + rcu_read_unlock(&sched_res_rculock); } void vcpu_unblock(struct vcpu *v) @@ -853,6 +895,8 @@ static void sched_unit_move_locked(struct sched_unit *unit, unsigned int old_cpu = unit->res->master_cpu; struct vcpu *v; + rcu_read_lock(&sched_res_rculock); + /* * Transfer urgency status to new CPU before switching CPUs, as * once the switch occurs, v->is_urgent is no longer protected by @@ -872,6 +916,8 @@ static void sched_unit_move_locked(struct sched_unit *unit, * pointer can't change while the current lock is held. */ sched_migrate(unit_scheduler(unit), unit, new_cpu); + + rcu_read_unlock(&sched_res_rculock); } /* @@ -1039,6 +1085,8 @@ void restore_vcpu_affinity(struct domain *d) ASSERT(system_state == SYS_STATE_resume); + rcu_read_lock(&sched_res_rculock); + for_each_sched_unit ( d, unit ) { spinlock_t *lock; @@ -1095,6 +1143,8 @@ void restore_vcpu_affinity(struct domain *d) sched_move_irqs(unit); } + rcu_read_unlock(&sched_res_rculock); + domain_update_node_affinity(d); } @@ -1110,9 +1160,11 @@ int cpu_disable_scheduler(unsigned int cpu) cpumask_t online_affinity; int ret = 0; + rcu_read_lock(&sched_res_rculock); + c = get_sched_res(cpu)->cpupool; if ( c == NULL ) - return ret; + goto out; for_each_domain_in_cpupool ( d, c ) { @@ -1170,6 +1222,9 @@ int cpu_disable_scheduler(unsigned int cpu) } } +out: + rcu_read_unlock(&sched_res_rculock); + return ret; } @@ -1201,7 +1256,9 @@ static int cpu_disable_scheduler_check(unsigned int cpu) static void sched_set_affinity( struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft) { + rcu_read_lock(&sched_res_rculock); sched_adjust_affinity(dom_scheduler(unit->domain), unit, hard, soft); + rcu_read_unlock(&sched_res_rculock); if ( hard ) cpumask_copy(unit->cpu_hard_affinity, hard); @@ -1221,6 +1278,8 @@ static int vcpu_set_affinity( spinlock_t *lock; int ret = 0; + rcu_read_lock(&sched_res_rculock); + lock = unit_schedule_lock_irq(unit); if ( v->affinity_broken ) @@ -1249,6 +1308,8 @@ static int vcpu_set_affinity( sched_unit_migrate_finish(unit); + rcu_read_unlock(&sched_res_rculock); + return ret; } @@ -1375,11 +1436,16 @@ static long do_poll(struct sched_poll *sched_poll) long vcpu_yield(void) { struct vcpu * v=current; - spinlock_t *lock = unit_schedule_lock_irq(v->sched_unit); + spinlock_t *lock; + + rcu_read_lock(&sched_res_rculock); + lock = unit_schedule_lock_irq(v->sched_unit); sched_yield(vcpu_scheduler(v), v->sched_unit); unit_schedule_unlock_irq(lock, v->sched_unit); + rcu_read_unlock(&sched_res_rculock); + SCHED_STAT_CRANK(vcpu_yield); TRACE_2D(TRC_SCHED_YIELD, current->domain->domain_id, current->vcpu_id); @@ -1476,6 +1542,8 @@ int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason) int ret = -EINVAL; bool migrate; + rcu_read_lock(&sched_res_rculock); + lock = unit_schedule_lock_irq(unit); if ( cpu == NR_CPUS ) @@ -1515,6 +1583,8 @@ int vcpu_temporary_affinity(struct vcpu *v, unsigned int cpu, uint8_t reason) if ( migrate ) sched_unit_migrate_finish(unit); + rcu_read_unlock(&sched_res_rculock); + return ret; } @@ -1726,9 +1796,13 @@ long sched_adjust(struct domain *d, struct xen_domctl_scheduler_op *op) /* NB: the pluggable scheduler code needs to take care * of locking by itself. */ + rcu_read_lock(&sched_res_rculock); + if ( (ret = sched_adjust_dom(dom_scheduler(d), d, op)) == 0 ) TRACE_1D(TRC_SCHED_ADJDOM, d->domain_id); + rcu_read_unlock(&sched_res_rculock); + return ret; } @@ -1749,9 +1823,13 @@ long sched_adjust_global(struct xen_sysctl_scheduler_op *op) if ( pool == NULL ) return -ESRCH; + rcu_read_lock(&sched_res_rculock); + rc = ((op->sched_id == pool->sched->sched_id) ? sched_adjust_cpupool(pool->sched, op) : -EINVAL); + rcu_read_unlock(&sched_res_rculock); + cpupool_put(pool); return rc; @@ -1971,7 +2049,11 @@ static void unit_context_saved(struct sched_resource *sr) void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) { struct sched_unit *next = vnext->sched_unit; - struct sched_resource *sr = get_sched_res(smp_processor_id()); + struct sched_resource *sr; + + rcu_read_lock(&sched_res_rculock); + + sr = get_sched_res(smp_processor_id()); if ( atomic_read(&next->rendezvous_out_cnt) ) { @@ -1998,6 +2080,8 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) if ( is_idle_vcpu(vprev) && vprev != vnext ) vprev->sched_unit = sr->sched_unit_idle; + + rcu_read_unlock(&sched_res_rculock); } static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, @@ -2021,6 +2105,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, vnext->sched_unit = get_sched_res(smp_processor_id())->sched_unit_idle; + rcu_read_unlock(&sched_res_rculock); + trace_continue_running(vnext); return continue_running(vprev); } @@ -2034,6 +2120,8 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, vcpu_periodic_timer_work(vnext); + rcu_read_unlock(&sched_res_rculock); + context_switch(vprev, vnext); } @@ -2186,6 +2274,8 @@ static void sched_slave(void) ASSERT_NOT_IN_ATOMIC(); + rcu_read_lock(&sched_res_rculock); + lock = pcpu_schedule_lock_irq(cpu); now = NOW(); @@ -2209,6 +2299,8 @@ static void sched_slave(void) { pcpu_schedule_unlock_irq(lock, cpu); + rcu_read_unlock(&sched_res_rculock); + /* Check for failed forced context switch. */ if ( do_softirq ) raise_softirq(SCHEDULE_SOFTIRQ); @@ -2241,13 +2333,16 @@ static void schedule(void) struct sched_resource *sr; spinlock_t *lock; int cpu = smp_processor_id(); - unsigned int gran = get_sched_res(cpu)->granularity; + unsigned int gran; ASSERT_NOT_IN_ATOMIC(); SCHED_STAT_CRANK(sched_run); + rcu_read_lock(&sched_res_rculock); + sr = get_sched_res(cpu); + gran = sr->granularity; lock = pcpu_schedule_lock_irq(cpu); @@ -2259,6 +2354,8 @@ static void schedule(void) */ pcpu_schedule_unlock_irq(lock, cpu); + rcu_read_unlock(&sched_res_rculock); + raise_softirq(SCHEDULE_SOFTIRQ); return sched_slave(); } @@ -2370,14 +2467,27 @@ static int cpu_schedule_up(unsigned int cpu) return 0; } +static void sched_res_free(struct rcu_head *head) +{ + struct sched_resource *sr = container_of(head, struct sched_resource, rcu); + + xfree(sr); +} + static void cpu_schedule_down(unsigned int cpu) { - struct sched_resource *sr = get_sched_res(cpu); + struct sched_resource *sr; + + rcu_read_lock(&sched_res_rculock); + + sr = get_sched_res(cpu); kill_timer(&sr->s_timer); set_sched_res(cpu, NULL); - xfree(sr); + call_rcu(&sr->rcu, sched_res_free); + + rcu_read_unlock(&sched_res_rculock); } void sched_rm_cpu(unsigned int cpu) @@ -2397,6 +2507,8 @@ static int cpu_schedule_callback( unsigned int cpu = (unsigned long)hcpu; int rc = 0; + rcu_read_lock(&sched_res_rculock); + /* * From the scheduler perspective, bringing up a pCPU requires * allocating and initializing the per-pCPU scheduler specific data, @@ -2443,6 +2555,8 @@ static int cpu_schedule_callback( break; } + rcu_read_unlock(&sched_res_rculock); + return !rc ? NOTIFY_DONE : notifier_from_errno(rc); } @@ -2532,8 +2646,13 @@ void __init scheduler_init(void) idle_domain->max_vcpus = nr_cpu_ids; if ( vcpu_create(idle_domain, 0) == NULL ) BUG(); + + rcu_read_lock(&sched_res_rculock); + get_sched_res(0)->curr = idle_vcpu[0]->sched_unit; get_sched_res(0)->sched_unit_idle = idle_vcpu[0]->sched_unit; + + rcu_read_unlock(&sched_res_rculock); } /* @@ -2546,9 +2665,14 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c) struct vcpu *idle; void *ppriv, *vpriv; struct scheduler *new_ops = c->sched; - struct sched_resource *sr = get_sched_res(cpu); + struct sched_resource *sr; spinlock_t *old_lock, *new_lock; unsigned long flags; + int ret = 0; + + rcu_read_lock(&sched_res_rculock); + + sr = get_sched_res(cpu); ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); ASSERT(!cpumask_test_cpu(cpu, c->cpu_valid)); @@ -2568,13 +2692,18 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c) idle = idle_vcpu[cpu]; ppriv = sched_alloc_pdata(new_ops, cpu); if ( IS_ERR(ppriv) ) - return PTR_ERR(ppriv); + { + ret = PTR_ERR(ppriv); + goto out; + } + vpriv = sched_alloc_udata(new_ops, idle->sched_unit, idle->domain->sched_priv); if ( vpriv == NULL ) { sched_free_pdata(new_ops, ppriv, cpu); - return -ENOMEM; + ret = -ENOMEM; + goto out; } /* @@ -2613,7 +2742,10 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c) /* The cpu is added to a pool, trigger it to go pick up some work */ cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ); - return 0; +out: + rcu_read_unlock(&sched_res_rculock); + + return ret; } /* @@ -2626,11 +2758,16 @@ int schedule_cpu_rm(unsigned int cpu) { struct vcpu *idle; void *ppriv_old, *vpriv_old; - struct sched_resource *sr = get_sched_res(cpu); - struct scheduler *old_ops = sr->scheduler; + struct sched_resource *sr; + struct scheduler *old_ops; spinlock_t *old_lock; unsigned long flags; + rcu_read_lock(&sched_res_rculock); + + sr = get_sched_res(cpu); + old_ops = sr->scheduler; + ASSERT(sr->cpupool != NULL); ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid)); @@ -2663,6 +2800,8 @@ int schedule_cpu_rm(unsigned int cpu) sr->granularity = 1; sr->cpupool = NULL; + rcu_read_unlock(&sched_res_rculock); + return 0; } @@ -2711,6 +2850,8 @@ void schedule_dump(struct cpupool *c) /* Locking, if necessary, must be handled withing each scheduler */ + rcu_read_lock(&sched_res_rculock); + if ( c != NULL ) { sched = c->sched; @@ -2730,6 +2871,8 @@ void schedule_dump(struct cpupool *c) for_each_cpu (i, cpus) sched_dump_cpu_state(sched, i); } + + rcu_read_unlock(&sched_res_rculock); } void sched_tick_suspend(void) @@ -2737,10 +2880,14 @@ void sched_tick_suspend(void) struct scheduler *sched; unsigned int cpu = smp_processor_id(); + rcu_read_lock(&sched_res_rculock); + sched = get_sched_res(cpu)->scheduler; sched_do_tick_suspend(sched, cpu); rcu_idle_enter(cpu); rcu_idle_timer_start(); + + rcu_read_unlock(&sched_res_rculock); } void sched_tick_resume(void) @@ -2748,10 +2895,14 @@ void sched_tick_resume(void) struct scheduler *sched; unsigned int cpu = smp_processor_id(); + rcu_read_lock(&sched_res_rculock); + rcu_idle_timer_stop(); rcu_idle_exit(cpu); sched = get_sched_res(cpu)->scheduler; sched_do_tick_resume(sched, cpu); + + rcu_read_unlock(&sched_res_rculock); } void wait(void) diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index f8f0f484cb..3988985ee6 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -10,6 +10,7 @@ #include #include +#include /* A global pointer to the initial cpupool (POOL0). */ extern struct cpupool *cpupool0; @@ -57,18 +58,20 @@ struct sched_resource { unsigned int master_cpu; unsigned int granularity; const cpumask_t *cpus; /* cpus covered by this struct */ + struct rcu_head rcu; }; DECLARE_PER_CPU(struct sched_resource *, sched_res); +extern rcu_read_lock_t sched_res_rculock; static inline struct sched_resource *get_sched_res(unsigned int cpu) { - return per_cpu(sched_res, cpu); + return rcu_dereference(per_cpu(sched_res, cpu)); } static inline void set_sched_res(unsigned int cpu, struct sched_resource *res) { - per_cpu(sched_res, cpu) = res; + rcu_assign_pointer(per_cpu(sched_res, cpu), res); } static inline struct sched_unit *curr_on_cpu(unsigned int cpu) From patchwork Mon Sep 30 05:21:31 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166027 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id A680517D4 for ; Mon, 30 Sep 2019 05:23:15 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 8CA3F20815 for ; Mon, 30 Sep 2019 05:23:15 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8CA3F20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8k-0002cy-7P; Mon, 30 Sep 2019 05:22:30 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8j-0002bx-BD for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:29 +0000 X-Inumbo-ID: 31b6ca00-e342-11e9-bf31-bc764e2007e4 Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31b6ca00-e342-11e9-bf31-bc764e2007e4; Mon, 30 Sep 2019 05:21:45 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 4D373B114; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:31 +0200 Message-Id: <20190930052135.11257-16-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 15/19] xen/sched: support multiple cpus per scheduling resource X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Prepare supporting multiple cpus per scheduling resource by allocating the cpumask per resource dynamically. Modify sched_res_mask to have only one bit per scheduling resource set. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V1: new patch (carved out from other patch) V4: - use cpumask_t for sched_res_mask (Jan Beulich) - clear cpu in sched_res_mask when taking cpu away (Jan Beulich) --- xen/common/cpupool.c | 4 ++-- xen/common/schedule.c | 15 +++++++++++++-- xen/include/xen/sched-if.h | 4 ++-- 3 files changed, 17 insertions(+), 6 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 7228ca84b4..13dffaadcf 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -283,7 +283,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) cpupool_cpu_moving = NULL; } cpumask_set_cpu(cpu, c->cpu_valid); - cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask); + cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask); rcu_read_lock(&domlist_read_lock); for_each_domain_in_cpupool(d, c) @@ -376,7 +376,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c, unsigned int cpu) atomic_inc(&c->refcnt); cpupool_cpu_moving = c; cpumask_clear_cpu(cpu, c->cpu_valid); - cpumask_and(c->res_valid, c->cpu_valid, sched_res_mask); + cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask); out: spin_unlock(&cpupool_lock); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 1f23bf0e83..efe077b01f 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -63,7 +63,7 @@ integer_param("sched_ratelimit_us", sched_ratelimit_us); /* Number of vcpus per struct sched_unit. */ bool __read_mostly sched_disable_smt_switching; -const cpumask_t *sched_res_mask = &cpumask_all; +cpumask_t sched_res_mask; /* Common lock for free cpus. */ static DEFINE_SPINLOCK(sched_free_cpu_lock); @@ -2426,8 +2426,14 @@ static int cpu_schedule_up(unsigned int cpu) sr = xzalloc(struct sched_resource); if ( sr == NULL ) return -ENOMEM; + if ( !zalloc_cpumask_var(&sr->cpus) ) + { + xfree(sr); + return -ENOMEM; + } + sr->master_cpu = cpu; - sr->cpus = cpumask_of(cpu); + cpumask_copy(sr->cpus, cpumask_of(cpu)); set_sched_res(cpu, sr); sr->scheduler = &sched_idle_ops; @@ -2439,6 +2445,8 @@ static int cpu_schedule_up(unsigned int cpu) /* We start with cpu granularity. */ sr->granularity = 1; + cpumask_set_cpu(cpu, &sched_res_mask); + /* Boot CPU is dealt with later in scheduler_init(). */ if ( cpu == 0 ) return 0; @@ -2471,6 +2479,7 @@ static void sched_res_free(struct rcu_head *head) { struct sched_resource *sr = container_of(head, struct sched_resource, rcu); + free_cpumask_var(sr->cpus); xfree(sr); } @@ -2484,7 +2493,9 @@ static void cpu_schedule_down(unsigned int cpu) kill_timer(&sr->s_timer); + cpumask_clear_cpu(cpu, &sched_res_mask); set_sched_res(cpu, NULL); + call_rcu(&sr->rcu, sched_res_free); rcu_read_unlock(&sched_res_rculock); diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 3988985ee6..780735dda3 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -24,7 +24,7 @@ extern cpumask_t cpupool_free_cpus; extern int sched_ratelimit_us; /* Scheduling resource mask. */ -extern const cpumask_t *sched_res_mask; +extern cpumask_t sched_res_mask; /* Number of vcpus per struct sched_unit. */ enum sched_gran { @@ -57,7 +57,7 @@ struct sched_resource { /* Cpu with lowest id in scheduling resource. */ unsigned int master_cpu; unsigned int granularity; - const cpumask_t *cpus; /* cpus covered by this struct */ + cpumask_var_t cpus; /* cpus covered by this struct */ struct rcu_head rcu; }; From patchwork Mon Sep 30 05:21:32 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166035 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id D4E0E16C1 for ; Mon, 30 Sep 2019 05:23:20 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id AFF1E218AC for ; Mon, 30 Sep 2019 05:23:20 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AFF1E218AC Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8P-0002Bp-4m; Mon, 30 Sep 2019 05:22:09 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8N-0002Aj-IA for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:07 +0000 X-Inumbo-ID: 31b6aa70-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31b6aa70-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:45 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 8191BB11F; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:32 +0200 Message-Id: <20190930052135.11257-17-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 16/19] xen/sched: support differing granularity in schedule_cpu_[add/rm]() X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , George Dunlap , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" With core scheduling active schedule_cpu_[add/rm]() has to cope with different scheduling granularity: a cpu not in any cpupool is subject to granularity 1 (cpu scheduling), while a cpu in a cpupool might be in a scheduling resource with more than one cpu. Handle that by having arrays of old/new pdata and vdata and loop over those where appropriate. Additionally the scheduling resource(s) must either be merged or split. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- xen/common/cpupool.c | 18 ++-- xen/common/schedule.c | 226 +++++++++++++++++++++++++++++++++++++++++++------- 2 files changed, 204 insertions(+), 40 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 13dffaadcf..04c3b3c04b 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -536,6 +536,7 @@ static void cpupool_cpu_remove(unsigned int cpu) ret = cpupool_unassign_cpu_finish(cpupool0); BUG_ON(ret); } + cpumask_clear_cpu(cpu, &cpupool_free_cpus); } /* @@ -585,20 +586,19 @@ static void cpupool_cpu_remove_forced(unsigned int cpu) struct cpupool **c; int ret; - if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) ) - cpumask_clear_cpu(cpu, &cpupool_free_cpus); - else + for_each_cpupool ( c ) { - for_each_cpupool(c) + if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) ) { - if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) ) - { - ret = cpupool_unassign_cpu(*c, cpu); - BUG_ON(ret); - } + ret = cpupool_unassign_cpu_start(*c, cpu); + BUG_ON(ret); + ret = cpupool_unassign_cpu_finish(*c); + BUG_ON(ret); } } + cpumask_clear_cpu(cpu, &cpupool_free_cpus); + rcu_read_lock(&sched_res_rculock); sched_rm_cpu(cpu); rcu_read_unlock(&sched_res_rculock); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index efe077b01f..e411b6d03e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -425,27 +425,30 @@ static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v) unit->runstate_cnt[v->runstate.state]++; } -static struct sched_unit *sched_alloc_unit(struct vcpu *v) +static struct sched_unit *sched_alloc_unit_mem(void) { - struct sched_unit *unit, **prev_unit; - struct domain *d = v->domain; - unsigned int gran = cpupool_get_granularity(d->cpupool); + struct sched_unit *unit; - for_each_sched_unit ( d, unit ) - if ( unit->unit_id / gran == v->vcpu_id / gran ) - break; + unit = xzalloc(struct sched_unit); + if ( !unit ) + return NULL; - if ( unit ) + if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) { - sched_unit_add_vcpu(unit, v); - return unit; + sched_free_unit_mem(unit); + unit = NULL; } - if ( (unit = xzalloc(struct sched_unit)) == NULL ) - return NULL; + return unit; +} + +static void sched_domain_insert_unit(struct sched_unit *unit, struct domain *d) +{ + struct sched_unit **prev_unit; unit->domain = d; - sched_unit_add_vcpu(unit, v); for ( prev_unit = &d->sched_unit_list; *prev_unit; prev_unit = &(*prev_unit)->next_in_list ) @@ -455,17 +458,31 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) unit->next_in_list = *prev_unit; *prev_unit = unit; +} - if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || - !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || - !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) - goto fail; +static struct sched_unit *sched_alloc_unit(struct vcpu *v) +{ + struct sched_unit *unit; + struct domain *d = v->domain; + unsigned int gran = cpupool_get_granularity(d->cpupool); - return unit; + for_each_sched_unit ( d, unit ) + if ( unit->unit_id / gran == v->vcpu_id / gran ) + break; - fail: - sched_free_unit(unit, v); - return NULL; + if ( unit ) + { + sched_unit_add_vcpu(unit, v); + return unit; + } + + if ( (unit = sched_alloc_unit_mem()) == NULL ) + return NULL; + + sched_unit_add_vcpu(unit, v); + sched_domain_insert_unit(unit, d); + + return unit; } static unsigned int sched_select_initial_cpu(const struct vcpu *v) @@ -2419,18 +2436,28 @@ static void poll_timer_fn(void *data) vcpu_unblock(v); } -static int cpu_schedule_up(unsigned int cpu) +static struct sched_resource *sched_alloc_res(void) { struct sched_resource *sr; sr = xzalloc(struct sched_resource); if ( sr == NULL ) - return -ENOMEM; + return NULL; if ( !zalloc_cpumask_var(&sr->cpus) ) { xfree(sr); - return -ENOMEM; + return NULL; } + return sr; +} + +static int cpu_schedule_up(unsigned int cpu) +{ + struct sched_resource *sr; + + sr = sched_alloc_res(); + if ( sr == NULL ) + return -ENOMEM; sr->master_cpu = cpu; cpumask_copy(sr->cpus, cpumask_of(cpu)); @@ -2480,6 +2507,8 @@ static void sched_res_free(struct rcu_head *head) struct sched_resource *sr = container_of(head, struct sched_resource, rcu); free_cpumask_var(sr->cpus); + if ( sr->sched_unit_idle ) + sched_free_unit_mem(sr->sched_unit_idle); xfree(sr); } @@ -2496,6 +2525,8 @@ static void cpu_schedule_down(unsigned int cpu) cpumask_clear_cpu(cpu, &sched_res_mask); set_sched_res(cpu, NULL); + /* Keep idle unit. */ + sr->sched_unit_idle = NULL; call_rcu(&sr->rcu, sched_res_free); rcu_read_unlock(&sched_res_rculock); @@ -2575,6 +2606,30 @@ static struct notifier_block cpu_schedule_nfb = { .notifier_call = cpu_schedule_callback }; +static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, + unsigned int cpu) +{ + const cpumask_t *mask; + + switch ( opt ) + { + case SCHED_GRAN_cpu: + mask = cpumask_of(cpu); + break; + case SCHED_GRAN_core: + mask = per_cpu(cpu_sibling_mask, cpu); + break; + case SCHED_GRAN_socket: + mask = per_cpu(cpu_core_mask, cpu); + break; + default: + ASSERT_UNREACHABLE(); + return NULL; + } + + return mask; +} + /* Initialise the data structures. */ void __init scheduler_init(void) { @@ -2730,6 +2785,46 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c) */ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); + if ( cpupool_get_granularity(c) > 1 ) + { + const cpumask_t *mask; + unsigned int cpu_iter, idx = 0; + struct sched_unit *old_unit, *master_unit; + struct sched_resource *sr_old; + + /* + * We need to merge multiple idle_vcpu units and sched_resource structs + * into one. As the free cpus all share the same lock we are fine doing + * that now. The worst which could happen would be someone waiting for + * the lock, thus dereferencing sched_res->schedule_lock. This is the + * reason we are freeing struct sched_res via call_rcu() to avoid the + * lock pointer suddenly disappearing. + */ + mask = sched_get_opt_cpumask(c->gran, cpu); + master_unit = idle_vcpu[cpu]->sched_unit; + + for_each_cpu ( cpu_iter, mask ) + { + if ( idx ) + cpumask_clear_cpu(cpu_iter, &sched_res_mask); + + per_cpu(sched_res_idx, cpu_iter) = idx++; + + if ( cpu == cpu_iter ) + continue; + + old_unit = idle_vcpu[cpu_iter]->sched_unit; + sr_old = get_sched_res(cpu_iter); + kill_timer(&sr_old->s_timer); + idle_vcpu[cpu_iter]->sched_unit = master_unit; + master_unit->runstate_cnt[RUNSTATE_running]++; + set_sched_res(cpu_iter, sr); + cpumask_set_cpu(cpu_iter, sr->cpus); + + call_rcu(&sr_old->rcu, sched_res_free); + } + } + new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv); sr->scheduler = new_ops; @@ -2767,33 +2862,100 @@ out: */ int schedule_cpu_rm(unsigned int cpu) { - struct vcpu *idle; void *ppriv_old, *vpriv_old; - struct sched_resource *sr; + struct sched_resource *sr, **sr_new = NULL; + struct sched_unit *unit; struct scheduler *old_ops; spinlock_t *old_lock; unsigned long flags; + int idx, ret = -ENOMEM; + unsigned int cpu_iter; rcu_read_lock(&sched_res_rculock); sr = get_sched_res(cpu); old_ops = sr->scheduler; + if ( sr->granularity > 1 ) + { + sr_new = xmalloc_array(struct sched_resource *, sr->granularity - 1); + if ( !sr_new ) + goto out; + for ( idx = 0; idx < sr->granularity - 1; idx++ ) + { + sr_new[idx] = sched_alloc_res(); + if ( sr_new[idx] ) + { + sr_new[idx]->sched_unit_idle = sched_alloc_unit_mem(); + if ( !sr_new[idx]->sched_unit_idle ) + { + sched_res_free(&sr_new[idx]->rcu); + sr_new[idx] = NULL; + } + } + if ( !sr_new[idx] ) + { + for ( idx--; idx >= 0; idx-- ) + sched_res_free(&sr_new[idx]->rcu); + goto out; + } + sr_new[idx]->curr = sr_new[idx]->sched_unit_idle; + sr_new[idx]->scheduler = &sched_idle_ops; + sr_new[idx]->granularity = 1; + + /* We want the lock not to change when replacing the resource. */ + sr_new[idx]->schedule_lock = sr->schedule_lock; + } + } + + ret = 0; ASSERT(sr->cpupool != NULL); ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); ASSERT(!cpumask_test_cpu(cpu, sr->cpupool->cpu_valid)); - idle = idle_vcpu[cpu]; - sched_do_tick_suspend(old_ops, cpu); /* See comment in schedule_cpu_add() regarding lock switching. */ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); - vpriv_old = idle->sched_unit->priv; + vpriv_old = idle_vcpu[cpu]->sched_unit->priv; ppriv_old = sr->sched_priv; - idle->sched_unit->priv = NULL; + idx = 0; + for_each_cpu ( cpu_iter, sr->cpus ) + { + per_cpu(sched_res_idx, cpu_iter) = 0; + if ( cpu_iter == cpu ) + { + idle_vcpu[cpu_iter]->sched_unit->priv = NULL; + } + else + { + /* Initialize unit. */ + unit = sr_new[idx]->sched_unit_idle; + unit->res = sr_new[idx]; + unit->is_running = true; + sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]); + sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain); + + /* Adjust cpu masks of resources (old and new). */ + cpumask_clear_cpu(cpu_iter, sr->cpus); + cpumask_set_cpu(cpu_iter, sr_new[idx]->cpus); + + /* Init timer. */ + init_timer(&sr_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter); + + /* Last resource initializations and insert resource pointer. */ + sr_new[idx]->master_cpu = cpu_iter; + set_sched_res(cpu_iter, sr_new[idx]); + + /* Last action: set the new lock pointer. */ + smp_mb(); + sr_new[idx]->schedule_lock = &sched_free_cpu_lock; + + idx++; + } + } sr->scheduler = &sched_idle_ops; sr->sched_priv = NULL; @@ -2811,9 +2973,11 @@ int schedule_cpu_rm(unsigned int cpu) sr->granularity = 1; sr->cpupool = NULL; +out: rcu_read_unlock(&sched_res_rculock); + xfree(sr_new); - return 0; + return ret; } struct scheduler *scheduler_get_default(void) From patchwork Mon Sep 30 05:21:33 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166041 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id EB51016C1 for ; Mon, 30 Sep 2019 05:23:30 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id C61F720815 for ; Mon, 30 Sep 2019 05:23:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C61F720815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8Z-0002Nj-7I; Mon, 30 Sep 2019 05:22:19 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8X-0002Lt-Gq for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:17 +0000 X-Inumbo-ID: 31e19c8a-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31e19c8a-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:46 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id D2CDEB122; Mon, 30 Sep 2019 05:21:43 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:33 +0200 Message-Id: <20190930052135.11257-18-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 17/19] xen/sched: support core scheduling for moving cpus to/from cpupools X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Tim Deegan , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Dario Faggioli , Julien Grall , Jan Beulich MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" With core scheduling active it is necessary to move multiple cpus at the same time to or from a cpupool in order to avoid split scheduling resources in between. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- V1: new patch --- xen/common/cpupool.c | 100 +++++++++++++++++++++++++++++++++------------ xen/common/schedule.c | 3 +- xen/include/xen/sched-if.h | 1 + 3 files changed, 76 insertions(+), 28 deletions(-) diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 04c3b3c04b..f7a13c7a4c 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -268,23 +268,30 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) { int ret; struct domain *d; + const cpumask_t *cpus; + + cpus = sched_get_opt_cpumask(c->gran, cpu); if ( (cpupool_moving_cpu == cpu) && (c != cpupool_cpu_moving) ) return -EADDRNOTAVAIL; - ret = schedule_cpu_add(cpu, c); + ret = schedule_cpu_add(cpumask_first(cpus), c); if ( ret ) return ret; - cpumask_clear_cpu(cpu, &cpupool_free_cpus); + rcu_read_lock(&sched_res_rculock); + + cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus); if (cpupool_moving_cpu == cpu) { cpupool_moving_cpu = -1; cpupool_put(cpupool_cpu_moving); cpupool_cpu_moving = NULL; } - cpumask_set_cpu(cpu, c->cpu_valid); + cpumask_or(c->cpu_valid, c->cpu_valid, cpus); cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask); + rcu_read_unlock(&sched_res_rculock); + rcu_read_lock(&domlist_read_lock); for_each_domain_in_cpupool(d, c) { @@ -298,6 +305,7 @@ static int cpupool_assign_cpu_locked(struct cpupool *c, unsigned int cpu) static int cpupool_unassign_cpu_finish(struct cpupool *c) { int cpu = cpupool_moving_cpu; + const cpumask_t *cpus; struct domain *d; int ret; @@ -310,7 +318,10 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c) */ rcu_read_lock(&domlist_read_lock); ret = cpu_disable_scheduler(cpu); - cpumask_set_cpu(cpu, &cpupool_free_cpus); + + rcu_read_lock(&sched_res_rculock); + cpus = get_sched_res(cpu)->cpus; + cpumask_or(&cpupool_free_cpus, &cpupool_free_cpus, cpus); /* * cpu_disable_scheduler() returning an error doesn't require resetting @@ -323,7 +334,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c) { ret = schedule_cpu_rm(cpu); if ( ret ) - cpumask_clear_cpu(cpu, &cpupool_free_cpus); + cpumask_andnot(&cpupool_free_cpus, &cpupool_free_cpus, cpus); else { cpupool_moving_cpu = -1; @@ -331,6 +342,7 @@ static int cpupool_unassign_cpu_finish(struct cpupool *c) cpupool_cpu_moving = NULL; } } + rcu_read_unlock(&sched_res_rculock); for_each_domain_in_cpupool(d, c) { @@ -345,6 +357,7 @@ static int cpupool_unassign_cpu_start(struct cpupool *c, unsigned int cpu) { int ret; struct domain *d; + const cpumask_t *cpus; spin_lock(&cpupool_lock); ret = -EADDRNOTAVAIL; @@ -353,7 +366,11 @@ static int cpupool_unassign_cpu_start(struct cpupool *c, unsigned int cpu) goto out; ret = 0; - if ( (c->n_dom > 0) && (cpumask_weight(c->cpu_valid) == 1) && + rcu_read_lock(&sched_res_rculock); + cpus = get_sched_res(cpu)->cpus; + + if ( (c->n_dom > 0) && + (cpumask_weight(c->cpu_valid) == cpumask_weight(cpus)) && (cpu != cpupool_moving_cpu) ) { rcu_read_lock(&domlist_read_lock); @@ -375,9 +392,10 @@ static int cpupool_unassign_cpu_start(struct cpupool *c, unsigned int cpu) cpupool_moving_cpu = cpu; atomic_inc(&c->refcnt); cpupool_cpu_moving = c; - cpumask_clear_cpu(cpu, c->cpu_valid); + cpumask_andnot(c->cpu_valid, c->cpu_valid, cpus); cpumask_and(c->res_valid, c->cpu_valid, &sched_res_mask); + rcu_read_unlock(&domlist_read_lock); out: spin_unlock(&cpupool_lock); @@ -417,11 +435,13 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) { int work_cpu; int ret; + unsigned int master_cpu; debugtrace_printk("cpupool_unassign_cpu(pool=%d,cpu=%d)\n", c->cpupool_id, cpu); - ret = cpupool_unassign_cpu_start(c, cpu); + master_cpu = sched_get_resource_cpu(cpu); + ret = cpupool_unassign_cpu_start(c, master_cpu); if ( ret ) { debugtrace_printk("cpupool_unassign_cpu(pool=%d,cpu=%d) ret %d\n", @@ -429,12 +449,12 @@ static int cpupool_unassign_cpu(struct cpupool *c, unsigned int cpu) return ret; } - work_cpu = smp_processor_id(); - if ( work_cpu == cpu ) + work_cpu = sched_get_resource_cpu(smp_processor_id()); + if ( work_cpu == master_cpu ) { work_cpu = cpumask_first(cpupool0->cpu_valid); - if ( work_cpu == cpu ) - work_cpu = cpumask_next(cpu, cpupool0->cpu_valid); + if ( work_cpu == master_cpu ) + work_cpu = cpumask_last(cpupool0->cpu_valid); } return continue_hypercall_on_cpu(work_cpu, cpupool_unassign_cpu_helper, c); } @@ -500,6 +520,7 @@ void cpupool_rm_domain(struct domain *d) static int cpupool_cpu_add(unsigned int cpu) { int ret = 0; + const cpumask_t *cpus; spin_lock(&cpupool_lock); cpumask_clear_cpu(cpu, &cpupool_locked_cpus); @@ -513,7 +534,11 @@ static int cpupool_cpu_add(unsigned int cpu) */ rcu_read_lock(&sched_res_rculock); get_sched_res(cpu)->cpupool = NULL; - ret = cpupool_assign_cpu_locked(cpupool0, cpu); + + cpus = sched_get_opt_cpumask(cpupool0->gran, cpu); + if ( cpumask_subset(cpus, &cpupool_free_cpus) ) + ret = cpupool_assign_cpu_locked(cpupool0, cpu); + rcu_read_unlock(&sched_res_rculock); spin_unlock(&cpupool_lock); @@ -548,27 +573,33 @@ static void cpupool_cpu_remove(unsigned int cpu) static int cpupool_cpu_remove_prologue(unsigned int cpu) { int ret = 0; + cpumask_t *cpus; + unsigned int master_cpu; spin_lock(&cpupool_lock); - if ( cpumask_test_cpu(cpu, &cpupool_locked_cpus) ) + rcu_read_lock(&sched_res_rculock); + cpus = get_sched_res(cpu)->cpus; + master_cpu = sched_get_resource_cpu(cpu); + if ( cpumask_intersects(cpus, &cpupool_locked_cpus) ) ret = -EBUSY; else cpumask_set_cpu(cpu, &cpupool_locked_cpus); + rcu_read_unlock(&sched_res_rculock); spin_unlock(&cpupool_lock); if ( ret ) return ret; - if ( cpumask_test_cpu(cpu, cpupool0->cpu_valid) ) + if ( cpumask_test_cpu(master_cpu, cpupool0->cpu_valid) ) { /* Cpupool0 is populated only after all cpus are up. */ ASSERT(system_state == SYS_STATE_active); - ret = cpupool_unassign_cpu_start(cpupool0, cpu); + ret = cpupool_unassign_cpu_start(cpupool0, master_cpu); } - else if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) ) + else if ( !cpumask_test_cpu(master_cpu, &cpupool_free_cpus) ) ret = -ENODEV; return ret; @@ -585,12 +616,13 @@ static void cpupool_cpu_remove_forced(unsigned int cpu) { struct cpupool **c; int ret; + unsigned int master_cpu = sched_get_resource_cpu(cpu); for_each_cpupool ( c ) { - if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) ) + if ( cpumask_test_cpu(master_cpu, (*c)->cpu_valid) ) { - ret = cpupool_unassign_cpu_start(*c, cpu); + ret = cpupool_unassign_cpu_start(*c, master_cpu); BUG_ON(ret); ret = cpupool_unassign_cpu_finish(*c); BUG_ON(ret); @@ -658,29 +690,45 @@ int cpupool_do_sysctl(struct xen_sysctl_cpupool_op *op) case XEN_SYSCTL_CPUPOOL_OP_ADDCPU: { unsigned cpu; + const cpumask_t *cpus; cpu = op->cpu; debugtrace_printk("cpupool_assign_cpu(pool=%d,cpu=%d)\n", op->cpupool_id, cpu); + spin_lock(&cpupool_lock); + + c = cpupool_find_by_id(op->cpupool_id); + ret = -ENOENT; + if ( c == NULL ) + goto addcpu_out; if ( cpu == XEN_SYSCTL_CPUPOOL_PAR_ANY ) - cpu = cpumask_first(&cpupool_free_cpus); + { + for_each_cpu ( cpu, &cpupool_free_cpus ) + { + cpus = sched_get_opt_cpumask(c->gran, cpu); + if ( cpumask_subset(cpus, &cpupool_free_cpus) ) + break; + } + ret = -ENODEV; + if ( cpu >= nr_cpu_ids ) + goto addcpu_out; + } ret = -EINVAL; if ( cpu >= nr_cpu_ids ) goto addcpu_out; ret = -ENODEV; - if ( !cpumask_test_cpu(cpu, &cpupool_free_cpus) || - cpumask_test_cpu(cpu, &cpupool_locked_cpus) ) - goto addcpu_out; - c = cpupool_find_by_id(op->cpupool_id); - ret = -ENOENT; - if ( c == NULL ) + cpus = sched_get_opt_cpumask(c->gran, cpu); + if ( !cpumask_subset(cpus, &cpupool_free_cpus) || + cpumask_intersects(cpus, &cpupool_locked_cpus) ) goto addcpu_out; ret = cpupool_assign_cpu_locked(c, cpu); + addcpu_out: spin_unlock(&cpupool_lock); debugtrace_printk("cpupool_assign_cpu(pool=%d,cpu=%d) ret %d\n", op->cpupool_id, cpu, ret); + } break; diff --git a/xen/common/schedule.c b/xen/common/schedule.c index e411b6d03e..48ddbdfd7e 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -2606,8 +2606,7 @@ static struct notifier_block cpu_schedule_nfb = { .notifier_call = cpu_schedule_callback }; -static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, - unsigned int cpu) +const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu) { const cpumask_t *mask; diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index 780735dda3..cd731d7172 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -638,5 +638,6 @@ affinity_balance_cpumask(const struct sched_unit *unit, int step, } void sched_rm_cpu(unsigned int cpu); +const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu); #endif /* __XEN_SCHED_IF_H__ */ From patchwork Mon Sep 30 05:21:34 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166039 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 6F53616C1 for ; Mon, 30 Sep 2019 05:23:24 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id 5552C20815 for ; Mon, 30 Sep 2019 05:23:24 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5552C20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8U-0002HY-4h; Mon, 30 Sep 2019 05:22:14 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8S-0002G9-IS for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:12 +0000 X-Inumbo-ID: 31e19c8b-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31e19c8b-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:46 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 33539B127; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:34 +0200 Message-Id: <20190930052135.11257-19-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 18/19] xen/sched: disable scheduling when entering ACPI deep sleep states X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , =?utf-8?q?Roger_Pau_Monn=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" When entering deep sleep states all domains are paused resulting in all cpus only running idle vcpus. This enables us to stop scheduling completely in order to avoid synchronization problems with core scheduling when individual cpus are offlined. Disabling the scheduler is done by replacing the softirq handler with a dummy scheduling routine only enabling tasklets to run. Signed-off-by: Juergen Gross Acked-by: Jan Beulich Reviewed-by: Dario Faggioli --- V2: new patch --- xen/arch/x86/acpi/power.c | 4 ++++ xen/common/schedule.c | 31 +++++++++++++++++++++++++++++-- xen/include/xen/sched.h | 2 ++ 3 files changed, 35 insertions(+), 2 deletions(-) diff --git a/xen/arch/x86/acpi/power.c b/xen/arch/x86/acpi/power.c index 01e6aec4e8..8078352312 100644 --- a/xen/arch/x86/acpi/power.c +++ b/xen/arch/x86/acpi/power.c @@ -145,12 +145,16 @@ static void freeze_domains(void) for_each_domain ( d ) domain_pause(d); rcu_read_unlock(&domlist_read_lock); + + scheduler_disable(); } static void thaw_domains(void) { struct domain *d; + scheduler_enable(); + rcu_read_lock(&domlist_read_lock); for_each_domain ( d ) { diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 48ddbdfd7e..dbffec8cf2 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -91,6 +91,8 @@ extern const struct scheduler *__start_schedulers_array[], *__end_schedulers_arr static struct scheduler __read_mostly ops; +static bool scheduler_active; + static void sched_set_affinity( struct sched_unit *unit, const cpumask_t *hard, const cpumask_t *soft); @@ -2275,6 +2277,13 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, cpu_relax(); *lock = pcpu_schedule_lock_irq(cpu); + + if ( unlikely(!scheduler_active) ) + { + ASSERT(is_idle_unit(prev)); + atomic_set(&prev->next_task->rendezvous_out_cnt, 0); + prev->rendezvous_in_cnt = 0; + } } return prev->next_task; @@ -2629,14 +2638,32 @@ const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, unsigned int cpu) return mask; } +static void schedule_dummy(void) +{ + sched_tasklet_check_cpu(smp_processor_id()); +} + +void scheduler_disable(void) +{ + scheduler_active = false; + open_softirq(SCHEDULE_SOFTIRQ, schedule_dummy); + open_softirq(SCHED_SLAVE_SOFTIRQ, schedule_dummy); +} + +void scheduler_enable(void) +{ + open_softirq(SCHEDULE_SOFTIRQ, schedule); + open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave); + scheduler_active = true; +} + /* Initialise the data structures. */ void __init scheduler_init(void) { struct domain *idle_domain; int i; - open_softirq(SCHEDULE_SOFTIRQ, schedule); - open_softirq(SCHED_SLAVE_SOFTIRQ, sched_slave); + scheduler_enable(); for ( i = 0; i < NUM_SCHEDULERS; i++) { diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h index a40bd5fb56..629a4c52e0 100644 --- a/xen/include/xen/sched.h +++ b/xen/include/xen/sched.h @@ -933,6 +933,8 @@ void restore_vcpu_affinity(struct domain *d); void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate); uint64_t get_cpu_idle_time(unsigned int cpu); void sched_guest_idle(void (*idle) (void), unsigned int cpu); +void scheduler_enable(void); +void scheduler_disable(void); /* * Used by idle loop to decide whether there is work to do: From patchwork Mon Sep 30 05:21:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11166023 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BA83313B1 for ; Mon, 30 Sep 2019 05:23:12 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A0FDA20815 for ; Mon, 30 Sep 2019 05:23:12 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A0FDA20815 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8e-0002UE-2k; Mon, 30 Sep 2019 05:22:24 +0000 Received: from all-amaz-eas1.inumbo.com ([34.197.232.57] helo=us1-amaz-eas2.inumbo.com) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1iEo8c-0002S9-Ix for xen-devel@lists.xenproject.org; Mon, 30 Sep 2019 05:22:22 +0000 X-Inumbo-ID: 31b6aa72-e342-11e9-96c8-12813bfff9fa Received: from mx1.suse.de (unknown [195.135.220.15]) by localhost (Halon) with ESMTPS id 31b6aa72-e342-11e9-96c8-12813bfff9fa; Mon, 30 Sep 2019 05:21:47 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx1.suse.de (Postfix) with ESMTP id 85B9CB12A; Mon, 30 Sep 2019 05:21:44 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Mon, 30 Sep 2019 07:21:35 +0200 Message-Id: <20190930052135.11257-20-jgross@suse.com> X-Mailer: git-send-email 2.16.4 In-Reply-To: <20190930052135.11257-1-jgross@suse.com> References: <20190930052135.11257-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v5 19/19] xen/sched: add scheduling granularity enum X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , Stefano Stabellini , Wei Liu , Konrad Rzeszutek Wilk , George Dunlap , Andrew Cooper , Ian Jackson , Tim Deegan , Julien Grall , Jan Beulich , Dario Faggioli , =?utf-8?q?Roger_Pau_Monn=C3=A9?= MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Add a scheduling granularity enum ("cpu", "core", "socket") for specification of the scheduling granularity. Initially it is set to "cpu", this can be modified by the new boot parameter (x86 only) "sched-gran". According to the selected granularity sched_granularity is set after all cpus are online. A test is added for all sched resources holding the same number of cpus. Fall back to core- or cpu-scheduling in that case. Signed-off-by: Juergen Gross Reviewed-by: Dario Faggioli --- RFC V2: - fixed freeing of sched_res when merging cpus - rename parameter to "sched-gran" (Jan Beulich) - rename parameter option from "thread" to "cpu" (Jan Beulich) V1: - rename scheduler_smp_init() to scheduler_gran_init(), let it be called by cpupool_init() - avoid using literal cpu number 0 in scheduler_percpu_init() (Jan Beulich) - style correction (Jan Beulich) - fallback to smaller granularity instead of panic in case of unbalanced cpu configuration V2: - style changes (Jan Beulich) - introduce CONFIG_HAS_SCHED_GRANULARITY (Jan Beulich) V4: - move code to cpupool.c --- xen/arch/x86/Kconfig | 1 + xen/common/Kconfig | 3 ++ xen/common/cpupool.c | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 84 insertions(+) diff --git a/xen/arch/x86/Kconfig b/xen/arch/x86/Kconfig index 288dc6c042..3f88adae97 100644 --- a/xen/arch/x86/Kconfig +++ b/xen/arch/x86/Kconfig @@ -22,6 +22,7 @@ config X86 select HAS_PASSTHROUGH select HAS_PCI select HAS_PDX + select HAS_SCHED_GRANULARITY select HAS_UBSAN select HAS_VPCI if !PV_SHIM_EXCLUSIVE && HVM select NEEDS_LIBELF diff --git a/xen/common/Kconfig b/xen/common/Kconfig index 16829f6274..e9247871a8 100644 --- a/xen/common/Kconfig +++ b/xen/common/Kconfig @@ -63,6 +63,9 @@ config HAS_GDBSX config HAS_IOPORTS bool +config HAS_SCHED_GRANULARITY + bool + config NEEDS_LIBELF bool diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index f7a13c7a4c..4d3adbdd8d 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -17,6 +17,7 @@ #include #include #include +#include #include #include @@ -37,6 +38,83 @@ static DEFINE_SPINLOCK(cpupool_lock); static enum sched_gran __read_mostly opt_sched_granularity = SCHED_GRAN_cpu; static unsigned int __read_mostly sched_granularity = 1; +#ifdef CONFIG_HAS_SCHED_GRANULARITY +static int __init sched_select_granularity(const char *str) +{ + if ( strcmp("cpu", str) == 0 ) + opt_sched_granularity = SCHED_GRAN_cpu; + else if ( strcmp("core", str) == 0 ) + opt_sched_granularity = SCHED_GRAN_core; + else if ( strcmp("socket", str) == 0 ) + opt_sched_granularity = SCHED_GRAN_socket; + else + return -EINVAL; + + return 0; +} +custom_param("sched-gran", sched_select_granularity); +#endif + +static unsigned int __init cpupool_check_granularity(void) +{ + unsigned int cpu; + unsigned int siblings, gran = 0; + + if ( opt_sched_granularity == SCHED_GRAN_cpu ) + return 1; + + for_each_online_cpu ( cpu ) + { + siblings = cpumask_weight(sched_get_opt_cpumask(opt_sched_granularity, + cpu)); + if ( gran == 0 ) + gran = siblings; + else if ( gran != siblings ) + return 0; + } + + sched_disable_smt_switching = true; + + return gran; +} + +/* Setup data for selected scheduler granularity. */ +static void __init cpupool_gran_init(void) +{ + unsigned int gran = 0; + const char *fallback = NULL; + + while ( gran == 0 ) + { + gran = cpupool_check_granularity(); + + if ( gran == 0 ) + { + switch ( opt_sched_granularity ) + { + case SCHED_GRAN_core: + opt_sched_granularity = SCHED_GRAN_cpu; + fallback = "Asymmetric cpu configuration.\n" + "Falling back to sched-gran=cpu.\n"; + break; + case SCHED_GRAN_socket: + opt_sched_granularity = SCHED_GRAN_core; + fallback = "Asymmetric cpu configuration.\n" + "Falling back to sched-gran=core.\n"; + break; + default: + ASSERT_UNREACHABLE(); + break; + } + } + } + + if ( fallback ) + warning_add(fallback); + + sched_granularity = gran; +} + unsigned int cpupool_get_granularity(const struct cpupool *c) { return c ? sched_granularity : 1; @@ -871,6 +949,8 @@ static int __init cpupool_init(void) unsigned int cpu; int err; + cpupool_gran_init(); + cpupool0 = cpupool_create(0, 0, &err); BUG_ON(cpupool0 == NULL); cpupool_put(cpupool0);