[v2,45/48] xen/sched: support differing granularity in schedule_cpu_[add/rm]()

Message ID	20190809145833.1020-46-jgross@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Fri, 9 Aug 2019 16:58:30 +0200 Message-Id: <20190809145833.1020-46-jgross@suse.com> In-Reply-To: <20190809145833.1020-1-jgross@suse.com> References: <20190809145833.1020-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v2 45/48] xen/sched: support differing granularity in schedule_cpu_[add/rm]() Precedence: list Cc: Juergen Gross <jgross@suse.com>, George Dunlap <george.dunlap@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen: add core scheduling support \| expand [v2,00/48] xen: add core scheduling support [v2,01/48] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces [v2,02/48] xen/sched: move per-vcpu scheduler private data pointer to sched_unit [v2,03/48] xen/sched: build a linked list of struct sched_unit [v2,04/48] xen/sched: introduce struct sched_resource [v2,05/48] xen/sched: let pick_cpu return a scheduler resource [v2,06/48] xen/sched: switch schedule_data.curr to point at sched_unit [v2,07/48] xen/sched: move per cpu scheduler private data into struct sched_resource [v2,08/48] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock [v2,09/48] xen/sched: move some per-vcpu items to struct sched_unit [v2,10/48] xen/sched: add scheduler helpers hiding vcpu [v2,11/48] xen/sched: rename scheduler related perf counters [v2,12/48] xen/sched: switch struct task_slice from vcpu to sched_unit [v2,13/48] xen/sched: add is_running indicator to struct sched_unit [v2,14/48] xen/sched: make null scheduler vcpu agnostic. [v2,15/48] xen/sched: make rt scheduler vcpu agnostic. [v2,16/48] xen/sched: make credit scheduler vcpu agnostic. [v2,17/48] xen/sched: make credit2 scheduler vcpu agnostic. [v2,18/48] xen/sched: make arinc653 scheduler vcpu agnostic. [v2,19/48] xen: add sched_unit_pause_nosync() and sched_unit_unpause() [v2,20/48] xen: let vcpu_create() select processor [v2,21/48] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers [v2,22/48] xen/sched: switch schedule() from vcpus to sched_units [v2,23/48] xen/sched: switch sched_move_irqs() to take sched_unit as parameter [v2,24/48] xen: switch from for_each_vcpu() to for_each_sched_unit() [v2,25/48] xen/sched: add runstate counters to struct sched_unit [v2,26/48] xen/sched: rework and rename vcpu_force_reschedule() [v2,27/48] xen/sched: Change vcpu_migrate_*() to operate on schedule unit [v2,28/48] xen/sched: move struct task_slice into struct sched_unit [v2,29/48] xen/sched: add code to sync scheduling of all vcpus of a sched unit [v2,30/48] xen/sched: introduce unit_runnable_state() [v2,31/48] xen/sched: add support for multiple vcpus per sched unit where missing [v2,32/48] xen/sched: modify cpupool_domain_cpumask() to be an unit mask [v2,33/48] xen/sched: support allocating multiple vcpus into one sched unit [v2,34/48] xen/sched: add a percpu resource index [v2,35/48] xen/sched: add fall back to idle vcpu when scheduling unit [v2,36/48] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware [v2,37/48] xen/sched: carve out freeing sched_unit memory into dedicated function [v2,38/48] xen/sched: move per-cpu variable scheduler to struct sched_resource [v2,39/48] xen/sched: move per-cpu variable cpupool to struct sched_resource [v2,40/48] xen/sched: reject switching smt on/off with core scheduling active [v2,41/48] xen/sched: prepare per-cpupool scheduling granularity [v2,42/48] xen/sched: split schedule_cpu_switch() [v2,43/48] xen/sched: protect scheduling resource via rcu [v2,44/48] xen/sched: support multiple cpus per scheduling resource [v2,45/48] xen/sched: support differing granularity in schedule_cpu_[add/rm]() [v2,46/48] xen/sched: support core scheduling for moving cpus to/from cpupools [v2,47/48] xen/sched: disable scheduling when entering ACPI deep sleep states [v2,48/48] xen/sched: add scheduling granularity enum

diff --git a/xen/common/cpupool.c b/xen/common/cpupool.c index 5d5c8d5430..41d594dace 100644 --- a/xen/common/cpupool.c +++ b/xen/common/cpupool.c @@ -535,6 +535,7 @@ static void cpupool_cpu_remove(unsigned int cpu) ret = cpupool_unassign_cpu_epilogue(cpupool0); BUG_ON(ret); } + cpumask_clear_cpu(cpu, &cpupool_free_cpus); } /* @@ -584,20 +585,19 @@ static void cpupool_cpu_remove_forced(unsigned int cpu) struct cpupool **c; int ret; - if ( cpumask_test_cpu(cpu, &cpupool_free_cpus) ) - cpumask_clear_cpu(cpu, &cpupool_free_cpus); - else + for_each_cpupool ( c ) { - for_each_cpupool(c) + if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) ) { - if ( cpumask_test_cpu(cpu, (*c)->cpu_valid) ) - { - ret = cpupool_unassign_cpu(*c, cpu); - BUG_ON(ret); - } + ret = cpupool_unassign_cpu_prologue(*c, cpu); + BUG_ON(ret); + ret = cpupool_unassign_cpu_epilogue(*c); + BUG_ON(ret); } } + cpumask_clear_cpu(cpu, &cpupool_free_cpus); + rcu_read_lock(&sched_res_rculock); sched_rm_cpu(cpu); rcu_read_unlock(&sched_res_rculock); diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 948fe1b838..a4555fd0fa 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -407,26 +407,29 @@ static void sched_unit_add_vcpu(struct sched_unit *unit, struct vcpu *v) unit->runstate_cnt[v->runstate.state]++; } -static struct sched_unit *sched_alloc_unit(struct vcpu *v) +static struct sched_unit *sched_alloc_unit_mem(void) { - struct sched_unit *unit, **prev_unit; - struct domain *d = v->domain; - unsigned int gran = d->cpupool ? d->cpupool->granularity : 1; + struct sched_unit *unit; - for_each_sched_unit ( d, unit ) - if ( unit->vcpu_list->vcpu_id / gran == v->vcpu_id / gran ) - break; + unit = xzalloc(struct sched_unit); + if ( !unit ) + return NULL; - if ( unit ) + if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || + !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || + !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) { - sched_unit_add_vcpu(unit, v); - return unit; + sched_free_unit_mem(unit); + unit = NULL; } - if ( (unit = xzalloc(struct sched_unit)) == NULL ) - return NULL; + return unit; +} + +static void sched_domain_insert_unit(struct sched_unit *unit, struct domain *d) +{ + struct sched_unit **prev_unit; - sched_unit_add_vcpu(unit, v); unit->domain = d; for ( prev_unit = &d->sched_unit_list; *prev_unit; @@ -437,17 +440,31 @@ static struct sched_unit *sched_alloc_unit(struct vcpu *v) unit->next_in_list = *prev_unit; *prev_unit = unit; +} - if ( !zalloc_cpumask_var(&unit->cpu_hard_affinity) || - !zalloc_cpumask_var(&unit->cpu_hard_affinity_saved) || - !zalloc_cpumask_var(&unit->cpu_soft_affinity) ) - goto fail; +static struct sched_unit *sched_alloc_unit(struct vcpu *v) +{ + struct sched_unit *unit; + struct domain *d = v->domain; + unsigned int gran = d->cpupool ? d->cpupool->granularity : 1; - return unit; + for_each_sched_unit ( d, unit ) + if ( unit->vcpu_list->vcpu_id / gran == v->vcpu_id / gran ) + break; - fail: - sched_free_unit(unit, v); - return NULL; + if ( unit ) + { + sched_unit_add_vcpu(unit, v); + return unit; + } + + if ( (unit = sched_alloc_unit_mem()) == NULL ) + return NULL; + + sched_unit_add_vcpu(unit, v); + sched_domain_insert_unit(unit, d); + + return unit; } static unsigned int sched_select_initial_cpu(const struct vcpu *v) @@ -2370,18 +2387,28 @@ static void poll_timer_fn(void *data) vcpu_unblock(v); } -static int cpu_schedule_up(unsigned int cpu) +static struct sched_resource *sched_alloc_res(void) { struct sched_resource *sd; sd = xzalloc(struct sched_resource); if ( sd == NULL ) - return -ENOMEM; + return NULL; if ( !zalloc_cpumask_var(&sd->cpus) ) { xfree(sd); - return -ENOMEM; + return NULL; } + return sd; +} + +static int cpu_schedule_up(unsigned int cpu) +{ + struct sched_resource *sd; + + sd = sched_alloc_res(); + if ( sd == NULL ) + return -ENOMEM; sd->processor = cpu; cpumask_copy(sd->cpus, cpumask_of(cpu)); @@ -2431,6 +2458,8 @@ static void sched_res_free(struct rcu_head *head) struct sched_resource *sd = container_of(head, struct sched_resource, rcu); free_cpumask_var(sd->cpus); + if ( sd->sched_unit_idle ) + sched_free_unit_mem(sd->sched_unit_idle); xfree(sd); } @@ -2445,6 +2474,8 @@ static void cpu_schedule_down(unsigned int cpu) kill_timer(&sd->s_timer); set_sched_res(cpu, NULL); + /* Keep idle unit. */ + sd->sched_unit_idle = NULL; call_rcu(&sd->rcu, sched_res_free); rcu_read_unlock(&sched_res_rculock); @@ -2524,6 +2555,30 @@ static struct notifier_block cpu_schedule_nfb = { .notifier_call = cpu_schedule_callback }; +static const cpumask_t *sched_get_opt_cpumask(enum sched_gran opt, + unsigned int cpu) +{ + const cpumask_t *mask; + + switch ( opt ) + { + case SCHED_GRAN_cpu: + mask = cpumask_of(cpu); + break; + case SCHED_GRAN_core: + mask = per_cpu(cpu_sibling_mask, cpu); + break; + case SCHED_GRAN_socket: + mask = per_cpu(cpu_core_mask, cpu); + break; + default: + ASSERT_UNREACHABLE(); + return NULL; + } + + return mask; +} + /* Initialise the data structures. */ void __init scheduler_init(void) { @@ -2682,6 +2737,46 @@ int schedule_cpu_add(unsigned int cpu, struct cpupool *c) */ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); + if ( c->granularity > 1 ) + { + const cpumask_t *mask; + unsigned int cpu_iter, idx = 0; + struct sched_unit *old_unit, *master_unit; + struct sched_resource *sd_old; + + /* + * We need to merge multiple idle_vcpu units and sched_resource structs + * into one. As the free cpus all share the same lock we are fine doing + * that now. The worst which could happen would be someone waiting for + * the lock, thus dereferencing sched_res->schedule_lock. This is the + * reason we are freeing struct sched_res via call_rcu() to avoid the + * lock pointer suddenly disappearing. + */ + mask = sched_get_opt_cpumask(c->opt_granularity, cpu); + master_unit = idle_vcpu[cpu]->sched_unit; + + for_each_cpu ( cpu_iter, mask ) + { + if ( idx ) + cpumask_clear_cpu(cpu_iter, sched_res_mask); + + per_cpu(sched_res_idx, cpu_iter) = idx++; + + if ( cpu == cpu_iter ) + continue; + + old_unit = idle_vcpu[cpu_iter]->sched_unit; + sd_old = get_sched_res(cpu_iter); + kill_timer(&sd_old->s_timer); + idle_vcpu[cpu_iter]->sched_unit = master_unit; + master_unit->runstate_cnt[RUNSTATE_running]++; + set_sched_res(cpu_iter, sd); + cpumask_set_cpu(cpu_iter, sd->cpus); + + call_rcu(&sd_old->rcu, sched_res_free); + } + } + new_lock = sched_switch_sched(new_ops, cpu, ppriv, vpriv); sd->scheduler = new_ops; @@ -2719,33 +2814,100 @@ out: */ int schedule_cpu_rm(unsigned int cpu) { - struct vcpu *idle; void *ppriv_old, *vpriv_old; - struct sched_resource *sd; + struct sched_resource *sd, **sd_new = NULL; + struct sched_unit *unit; struct scheduler *old_ops; spinlock_t *old_lock; unsigned long flags; + int idx, ret = -ENOMEM; + unsigned int cpu_iter; rcu_read_lock(&sched_res_rculock); sd = get_sched_res(cpu); old_ops = sd->scheduler; + if ( sd->granularity > 1 ) + { + sd_new = xmalloc_array(struct sched_resource *, sd->granularity - 1); + if ( !sd_new ) + goto out; + for ( idx = 0; idx < sd->granularity - 1; idx++ ) + { + sd_new[idx] = sched_alloc_res(); + if ( sd_new[idx] ) + { + sd_new[idx]->sched_unit_idle = sched_alloc_unit_mem(); + if ( !sd_new[idx]->sched_unit_idle ) + { + sched_res_free(&sd_new[idx]->rcu); + sd_new[idx] = NULL; + } + } + if ( !sd_new[idx] ) + { + for ( idx--; idx >= 0; idx-- ) + sched_res_free(&sd_new[idx]->rcu); + goto out; + } + sd_new[idx]->curr = sd_new[idx]->sched_unit_idle; + sd_new[idx]->scheduler = &sched_idle_ops; + sd_new[idx]->granularity = 1; + + /* We want the lock not to change when replacing the resource. */ + sd_new[idx]->schedule_lock = sd->schedule_lock; + } + } + + ret = 0; ASSERT(sd->cpupool != NULL); ASSERT(cpumask_test_cpu(cpu, &cpupool_free_cpus)); ASSERT(!cpumask_test_cpu(cpu, sd->cpupool->cpu_valid)); - idle = idle_vcpu[cpu]; - sched_do_tick_suspend(old_ops, cpu); /* See comment in schedule_cpu_add() regarding lock switching. */ old_lock = pcpu_schedule_lock_irqsave(cpu, &flags); - vpriv_old = idle->sched_unit->priv; + vpriv_old = idle_vcpu[cpu]->sched_unit->priv; ppriv_old = sd->sched_priv; - idle->sched_unit->priv = NULL; + idx = 0; + for_each_cpu ( cpu_iter, sd->cpus ) + { + per_cpu(sched_res_idx, cpu_iter) = 0; + if ( cpu_iter == cpu ) + { + idle_vcpu[cpu_iter]->sched_unit->priv = NULL; + } + else + { + /* Initialize unit. */ + unit = sd_new[idx]->sched_unit_idle; + unit->res = sd_new[idx]; + unit->is_running = true; + sched_unit_add_vcpu(unit, idle_vcpu[cpu_iter]); + sched_domain_insert_unit(unit, idle_vcpu[cpu_iter]->domain); + + /* Adjust cpu masks of resources (old and new). */ + cpumask_clear_cpu(cpu_iter, sd->cpus); + cpumask_set_cpu(cpu_iter, sd_new[idx]->cpus); + + /* Init timer. */ + init_timer(&sd_new[idx]->s_timer, s_timer_fn, NULL, cpu_iter); + + /* Last resource initializations and insert resource pointer. */ + sd_new[idx]->processor = cpu_iter; + set_sched_res(cpu_iter, sd_new[idx]); + + /* Last action: set the new lock pointer. */ + smp_mb(); + sd_new[idx]->schedule_lock = &sched_free_cpu_lock; + + idx++; + } + } sd->scheduler = &sched_idle_ops; sd->sched_priv = NULL; @@ -2763,9 +2925,11 @@ int schedule_cpu_rm(unsigned int cpu) sd->granularity = 1; sd->cpupool = NULL; +out: rcu_read_unlock(&sched_res_rculock); + xfree(sd_new); - return 0; + return ret; } struct scheduler *scheduler_get_default(void)

[v2,45/48] xen/sched: support differing granularity in schedule_cpu_[add/rm]()

Commit Message

Patch