[RFC,V2,42/45] xen/sched: add fall back to idle vcpu when scheduling item

Message ID	20190506065644.7415-43-jgross@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <xen-devel-bounces@lists.xenproject.org> From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Mon, 6 May 2019 08:56:41 +0200 Message-Id: <20190506065644.7415-43-jgross@suse.com> In-Reply-To: <20190506065644.7415-1-jgross@suse.com> References: <20190506065644.7415-1-jgross@suse.com> Subject: [Xen-devel] [PATCH RFC V2 42/45] xen/sched: add fall back to idle vcpu when scheduling item Precedence: list Cc: Juergen Gross <jgross@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wei.liu2@citrix.com>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>, Julien Grall <julien.grall@arm.com>, Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com>, =?utf-8?q?Roger_Pau_Monn=C3=A9?= <roger.pau@citrix.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen: add core scheduling support \| expand [RFC,V2,00/45] xen: add core scheduling support [RFC,V2,01/45] xen/sched: add inline wrappers for calling per-scheduler functions [RFC,V2,02/45] xen/sched: use new sched_item instead of vcpu in scheduler interfaces [RFC,V2,03/45] xen/sched: alloc struct sched_item for each vcpu [RFC,V2,04/45] xen/sched: move per-vcpu scheduler private data pointer to sched_item [RFC,V2,05/45] xen/sched: build a linked list of struct sched_item [RFC,V2,06/45] xen/sched: introduce struct sched_resource [RFC,V2,07/45] xen/sched: let pick_cpu return a scheduler resource [RFC,V2,08/45] xen/sched: switch schedule_data.curr to point at sched_item [RFC,V2,09/45] xen/sched: move per cpu scheduler private data into struct sched_resource [RFC,V2,10/45] xen/sched: switch vcpu_schedule_lock to item_schedule_lock [RFC,V2,11/45] xen/sched: move some per-vcpu items to struct sched_item [RFC,V2,12/45] xen/sched: add scheduler helpers hiding vcpu [RFC,V2,13/45] xen/sched: add domain pointer to struct sched_item [RFC,V2,14/45] xen/sched: add id to struct sched_item [RFC,V2,15/45] xen/sched: rename scheduler related perf counters [RFC,V2,16/45] xen/sched: switch struct task_slice from vcpu to sched_item [RFC,V2,17/45] xen/sched: add is_running indicator to struct sched_item [RFC,V2,18/45] xen/sched: make null scheduler vcpu agnostic. [RFC,V2,19/45] xen/sched: make rt scheduler vcpu agnostic. [RFC,V2,20/45] xen/sched: make credit scheduler vcpu agnostic. [RFC,V2,21/45] xen/sched: make credit2 scheduler vcpu agnostic. [RFC,V2,22/45] xen/sched: make arinc653 scheduler vcpu agnostic. [RFC,V2,23/45] xen: add sched_item_pause_nosync() and sched_item_unpause() [RFC,V2,24/45] xen: let vcpu_create() select processor [RFC,V2,25/45] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers [RFC,V2,26/45] xen/sched: switch schedule() from vcpus to sched_items [RFC,V2,27/45] xen/sched: switch sched_move_irqs() to take sched_item as parameter [RFC,V2,28/45] xen: switch from for_each_vcpu() to for_each_sched_item() [RFC,V2,29/45] xen/sched: add runstate counters to struct sched_item [RFC,V2,30/45] xen/sched: rework and rename vcpu_force_reschedule() [RFC,V2,31/45] xen/sched: Change vcpu_migrate_*() to operate on schedule item [RFC,V2,32/45] xen/sched: move struct task_slice into struct sched_item [RFC,V2,33/45] xen/sched: add code to sync scheduling of all vcpus of a sched item [RFC,V2,34/45] xen/sched: introduce item_runnable_state() [RFC,V2,35/45] xen/sched: add support for multiple vcpus per sched item where missing [RFC,V2,36/45] x86: make loading of GDT at context switch more modular [RFC,V2,37/45] x86: optimize loading of GDT at context switch [RFC,V2,38/45] xen/sched: modify cpupool_domain_cpumask() to be an item mask [RFC,V2,39/45] xen/sched: support allocating multiple vcpus into one sched item [RFC,V2,40/45] xen/sched: add a scheduler_percpu_init() function [RFC,V2,41/45] xen/sched: add a percpu resource index [RFC,V2,42/45] xen/sched: add fall back to idle vcpu when scheduling item [RFC,V2,43/45] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware [RFC,V2,44/45] xen/sched: carve out freeing sched_item memory into dedicated function [RFC,V2,45/45] xen/sched: add scheduling granularity enum

diff --git a/xen/arch/x86/domain.c b/xen/arch/x86/domain.c index d04e704116..f3dbca5dba 100644 --- a/xen/arch/x86/domain.c +++ b/xen/arch/x86/domain.c @@ -154,6 +154,24 @@ static void idle_loop(void) } } +/* + * Idle loop for siblings of active schedule items. + * We don't do any standard idle work like tasklets, page scrubbing or + * livepatching. + * Use default_idle() in order to simulate v->is_urgent. + */ +static void guest_idle_loop(void) +{ + unsigned int cpu = smp_processor_id(); + + for ( ; ; ) + { + if ( !softirq_pending(cpu) ) + default_idle(); + do_softirq(); + } +} + void startup_cpu_idle_loop(void) { struct vcpu *v = current; @@ -167,6 +185,9 @@ void startup_cpu_idle_loop(void) static void noreturn continue_idle_domain(struct vcpu *v) { + if ( !is_idle_item(v->sched_item) ) + reset_stack_and_jump(guest_idle_loop); + reset_stack_and_jump(idle_loop); } diff --git a/xen/common/schedule.c b/xen/common/schedule.c index 0de199ccc9..788ecc9e81 100644 --- a/xen/common/schedule.c +++ b/xen/common/schedule.c @@ -82,7 +82,18 @@ static struct scheduler __read_mostly ops; static inline struct vcpu *sched_item2vcpu_cpu(struct sched_item *item, unsigned int cpu) { - return item->domain->vcpu[item->item_id + per_cpu(sched_res_idx, cpu)]; + unsigned int idx = item->item_id + per_cpu(sched_res_idx, cpu); + const struct domain *d = item->domain; + struct vcpu *v; + + if ( idx < d->max_vcpus && d->vcpu[idx] ) + { + v = d->vcpu[idx]; + if ( v->new_state == RUNSTATE_running ) + return v; + } + + return idle_vcpu[cpu]; } static inline struct scheduler *dom_scheduler(const struct domain *d) @@ -196,8 +207,11 @@ static inline void vcpu_runstate_change( trace_runstate_change(v, new_state); - item->runstate_cnt[v->runstate.state]--; - item->runstate_cnt[new_state]++; + if ( !is_idle_vcpu(v) ) + { + item->runstate_cnt[v->runstate.state]--; + item->runstate_cnt[new_state]++; + } delta = new_entry_time - v->runstate.state_entry_time; if ( delta > 0 ) @@ -209,21 +223,6 @@ static inline void vcpu_runstate_change( v->runstate.state = new_state; } -static inline void sched_item_runstate_change(struct sched_item *item, - bool running, s_time_t new_entry_time) -{ - struct vcpu *v; - - for_each_sched_item_vcpu( item, v ) - if ( running ) - vcpu_runstate_change(v, v->new_state, new_entry_time); - else - vcpu_runstate_change(v, - ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked : - (vcpu_runnable(v) ? RUNSTATE_runnable : RUNSTATE_offline)), - new_entry_time); -} - void vcpu_runstate_get(struct vcpu *v, struct vcpu_runstate_info *runstate) { spinlock_t *lock = likely(v == current) @@ -456,6 +455,7 @@ int sched_init_vcpu(struct vcpu *v) if ( is_idle_domain(d) ) { per_cpu(sched_res, v->processor)->curr = item; + per_cpu(sched_res, v->processor)->sched_item_idle = item; v->is_running = 1; item->is_running = 1; item->state_entry_time = NOW(); @@ -1631,33 +1631,67 @@ static void sched_switch_items(struct sched_resource *sd, struct sched_item *next, struct sched_item *prev, s_time_t now) { - sd->curr = next; - - TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, prev->item_id, - now - prev->state_entry_time); - TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, next->item_id, - (next->vcpu->runstate.state == RUNSTATE_runnable) ? - (now - next->state_entry_time) : 0, prev->next_time); + int cpu; ASSERT(item_running(prev)); - TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->item_id, - next->domain->domain_id, next->item_id); + if ( prev != next ) + { + sd->curr = next; + sd->prev = prev; - sched_item_runstate_change(prev, false, now); - prev->last_run_time = now; + TRACE_3D(TRC_SCHED_SWITCH_INFPREV, prev->domain->domain_id, + prev->item_id, now - prev->state_entry_time); + TRACE_4D(TRC_SCHED_SWITCH_INFNEXT, next->domain->domain_id, + next->item_id, + (next->vcpu->runstate.state == RUNSTATE_runnable) ? + (now - next->state_entry_time) : 0, prev->next_time); + TRACE_4D(TRC_SCHED_SWITCH, prev->domain->domain_id, prev->item_id, + next->domain->domain_id, next->item_id); - ASSERT(!item_running(next)); - sched_item_runstate_change(next, true, now); + prev->last_run_time = now; - /* - * NB. Don't add any trace records from here until the actual context - * switch, else lost_records resume will not work properly. - */ + ASSERT(!item_running(next)); + + /* + * NB. Don't add any trace records from here until the actual context + * switch, else lost_records resume will not work properly. + */ + + ASSERT(!next->is_running); + next->is_running = 1; - ASSERT(!next->is_running); - next->vcpu->is_running = 1; - next->is_running = 1; + if ( is_idle_item(prev) ) + { + prev->runstate_cnt[RUNSTATE_running] = 0; + prev->runstate_cnt[RUNSTATE_runnable] = sched_granularity; + } + if ( is_idle_item(next) ) + { + next->runstate_cnt[RUNSTATE_running] = sched_granularity; + next->runstate_cnt[RUNSTATE_runnable] = 0; + } + } + + for_each_cpu( cpu, sd->cpus ) + { + struct vcpu *vprev = get_cpu_current(cpu); + struct vcpu *vnext = sched_item2vcpu_cpu(next, cpu); + + if ( vprev != vnext || vprev->runstate.state != vnext->new_state ) + { + vcpu_runstate_change(vprev, + ((vprev->pause_flags & VPF_blocked) ? RUNSTATE_blocked : + (vcpu_runnable(vprev) ? RUNSTATE_runnable : RUNSTATE_offline)), + now); + vcpu_runstate_change(vnext, vnext->new_state, now); + } + + vnext->is_running = 1; + + if ( is_idle_vcpu(vnext) ) + vnext->sched_item = next; + } } static bool sched_tasklet_check(void) @@ -1706,25 +1740,25 @@ static struct sched_item *do_schedule(struct sched_item *prev, s_time_t now) if ( prev->next_time >= 0 ) /* -ve means no limit */ set_timer(&sd->s_timer, now + prev->next_time); - if ( likely(prev != next) ) - sched_switch_items(sd, next, prev, now); + sched_switch_items(sd, next, prev, now); return next; } -static void context_saved(struct vcpu *prev) +static void context_saved(struct sched_item *item) { - struct sched_item *item = prev->sched_item; - item->is_running = 0; item->state_entry_time = NOW(); + this_cpu(sched_res)->prev = NULL; /* Check for migration request /after/ clearing running flag. */ smp_mb(); - sched_context_saved(vcpu_scheduler(prev), item); + sched_context_saved(vcpu_scheduler(item->vcpu), item); - sched_item_migrate_finish(item); + /* Idle never migrates and idle vcpus might belong to other items. */ + if ( !is_idle_item(item) ) + sched_item_migrate_finish(item); } /* @@ -1741,11 +1775,13 @@ static void context_saved(struct vcpu *prev) void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) { struct sched_item *next = vnext->sched_item; + struct sched_resource *sd = this_cpu(sched_res); /* Clear running flag /after/ writing context to memory. */ smp_wmb(); - vprev->is_running = 0; + if ( vprev != vnext ) + vprev->is_running = 0; if ( atomic_read(&next->rendezvous_out_cnt) ) { @@ -1754,20 +1790,23 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) /* Call context_saved() before releasing other waiters. */ if ( cnt == 1 ) { - if ( vprev != vnext ) - context_saved(vprev); + if ( sd->prev ) + context_saved(sd->prev); atomic_set(&next->rendezvous_out_cnt, 0); } else while ( atomic_read(&next->rendezvous_out_cnt) ) cpu_relax(); } - else if ( vprev != vnext && sched_granularity == 1 ) - context_saved(vprev); + else if ( sd->prev ) + context_saved(sd->prev); + + if ( is_idle_vcpu(vprev) && vprev != vnext ) + vprev->sched_item = sd->sched_item_idle; } static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, - s_time_t now) + bool reset_idle_item, s_time_t now) { if ( unlikely(vprev == vnext) ) { @@ -1776,6 +1815,10 @@ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, now - vprev->runstate.state_entry_time, vprev->sched_item->next_time); sched_context_switched(vprev, vnext); + + if ( reset_idle_item ) + vnext->sched_item = this_cpu(sched_res)->sched_item_idle; + trace_continue_running(vnext); return continue_running(vprev); } @@ -1851,7 +1894,8 @@ static void sched_slave(void) pcpu_schedule_unlock_irq(lock, cpu); - sched_context_switch(vprev, sched_item2vcpu_cpu(next, cpu), now); + sched_context_switch(vprev, sched_item2vcpu_cpu(next, cpu), + is_idle_item(next) && !is_idle_item(prev), now); } /* @@ -1911,7 +1955,8 @@ static void schedule(void) pcpu_schedule_unlock_irq(lock, cpu); vnext = sched_item2vcpu_cpu(next, cpu); - sched_context_switch(vprev, vnext, now); + sched_context_switch(vprev, vnext, + !is_idle_item(prev) && is_idle_item(next), now); } /* The scheduler timer: force a run through the scheduler */ @@ -1993,6 +2038,7 @@ static int cpu_schedule_up(unsigned int cpu) return -ENOMEM; sd->curr = idle_vcpu[cpu]->sched_item; + sd->sched_item_idle = idle_vcpu[cpu]->sched_item; /* * We don't want to risk calling xfree() on an sd->sched_priv @@ -2170,6 +2216,7 @@ void __init scheduler_init(void) if ( vcpu_create(idle_domain, 0) == NULL ) BUG(); this_cpu(sched_res)->curr = idle_vcpu[0]->sched_item; + this_cpu(sched_res)->sched_item_idle = idle_vcpu[0]->sched_item; this_cpu(sched_res)->sched_priv = sched_alloc_pdata(&ops, 0); BUG_ON(IS_ERR(this_cpu(sched_res)->sched_priv)); scheduler_percpu_init(0); diff --git a/xen/include/asm-arm/current.h b/xen/include/asm-arm/current.h index c4af66fbb9..a7602eef8c 100644 --- a/xen/include/asm-arm/current.h +++ b/xen/include/asm-arm/current.h @@ -18,6 +18,7 @@ DECLARE_PER_CPU(struct vcpu *, curr_vcpu); #define current (this_cpu(curr_vcpu)) #define set_current(vcpu) do { current = (vcpu); } while (0) +#define get_cpu_current(cpu) (per_cpu(curr_vcpu, cpu)) /* Per-VCPU state that lives at the top of the stack */ struct cpu_info { diff --git a/xen/include/asm-x86/current.h b/xen/include/asm-x86/current.h index 5bd64b2271..cb5b6f1176 100644 --- a/xen/include/asm-x86/current.h +++ b/xen/include/asm-x86/current.h @@ -76,6 +76,11 @@ struct cpu_info { /* get_stack_bottom() must be 16-byte aligned */ }; +static inline struct cpu_info *get_cpu_info_from_stack(unsigned long sp) +{ + return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1; +} + static inline struct cpu_info *get_cpu_info(void) { #ifdef __clang__ @@ -86,7 +91,7 @@ static inline struct cpu_info *get_cpu_info(void) register unsigned long sp asm("rsp"); #endif - return (struct cpu_info *)((sp | (STACK_SIZE - 1)) + 1) - 1; + return get_cpu_info_from_stack(sp); } #define get_current() (get_cpu_info()->current_vcpu) diff --git a/xen/include/asm-x86/smp.h b/xen/include/asm-x86/smp.h index 9f533f9072..51a31ab00a 100644 --- a/xen/include/asm-x86/smp.h +++ b/xen/include/asm-x86/smp.h @@ -76,6 +76,9 @@ void set_nr_sockets(void); /* Representing HT and core siblings in each socket. */ extern cpumask_t **socket_cpumask; +#define get_cpu_current(cpu) \ + (get_cpu_info_from_stack((unsigned long)stack_base[cpu])->current_vcpu) + #endif /* !__ASSEMBLY__ */ #endif diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h index b3921f3a41..8981d41629 100644 --- a/xen/include/xen/sched-if.h +++ b/xen/include/xen/sched-if.h @@ -39,6 +39,8 @@ struct sched_resource { spinlock_t *schedule_lock, _lock; struct sched_item *curr; /* current task */ + struct sched_item *sched_item_idle; + struct sched_item *prev; /* previous task */ void *sched_priv; struct timer s_timer; /* scheduling timer */ atomic_t urgent_count; /* how many urgent vcpus */ @@ -152,7 +154,7 @@ static inline void sched_clear_pause_flags_atomic(struct sched_item *item, static inline struct sched_item *sched_idle_item(unsigned int cpu) { - return idle_vcpu[cpu]->sched_item; + return per_cpu(sched_res, cpu)->sched_item_idle; } static inline unsigned int sched_get_resource_cpu(unsigned int cpu)

[RFC,V2,42/45] xen/sched: add fall back to idle vcpu when scheduling item

Commit Message

Comments

Patch