From patchwork Tue Mar 3 17:39:04 2020 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= X-Patchwork-Id: 11418589 Return-Path: Received: from mail.kernel.org (pdx-korg-mail-1.web.codeaurora.org [172.30.200.123]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE91C1580 for ; Tue, 3 Mar 2020 17:40:17 +0000 (UTC) Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPS id A49A320CC7 for ; Tue, 3 Mar 2020 17:40:17 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A49A320CC7 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=xen-devel-bounces@lists.xenproject.org Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j9BVe-0000rK-C7; Tue, 03 Mar 2020 17:39:10 +0000 Received: from us1-rack-iad1.inumbo.com ([172.99.69.81]) by lists.xenproject.org with esmtp (Exim 4.89) (envelope-from ) id 1j9BVd-0000rA-4V for xen-devel@lists.xenproject.org; Tue, 03 Mar 2020 17:39:09 +0000 X-Inumbo-ID: e258e62e-5d75-11ea-8adc-bc764e2007e4 Received: from mx2.suse.de (unknown [195.135.220.15]) by us1-rack-iad1.inumbo.com (Halon) with ESMTPS id e258e62e-5d75-11ea-8adc-bc764e2007e4; Tue, 03 Mar 2020 17:39:08 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.220.254]) by mx2.suse.de (Postfix) with ESMTP id 7528DB153; Tue, 3 Mar 2020 17:39:07 +0000 (UTC) From: Juergen Gross To: xen-devel@lists.xenproject.org Date: Tue, 3 Mar 2020 18:39:04 +0100 Message-Id: <20200303173904.23492-1-jgross@suse.com> X-Mailer: git-send-email 2.16.4 Subject: [Xen-devel] [PATCH v2] xen/sched: fix cpu offlining with core scheduling X-BeenThere: xen-devel@lists.xenproject.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Cc: Juergen Gross , George Dunlap , Dario Faggioli MIME-Version: 1.0 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" Offlining a cpu with core scheduling active can result in a hanging system. Reason is the scheduling resource and unit of the to be removed cpus needs to be split in order to remove the cpu from its cpupool and move it to the idle scheduler. In case one of the involved cpus happens to have received a sched slave event due to a vcpu former having been running on that cpu being woken up again, it can happen that this cpu will enter sched_wait_rendezvous_in() while its scheduling resource is just about to be split. It might wait for ever for the other sibling to join, which will never happen due to the resources already being modified. This can easily be avoided by: - resetting the rendezvous counters of the idle unit which is kept - checking for a new scheduling resource in sched_wait_rendezvous_in() after reacquiring the scheduling lock and resetting the counters in that case without scheduling another vcpu - moving schedule resource modifications (in schedule_cpu_rm()) and retrieving (schedule(), sched_slave() is fine already, others are not critical) into locked regions Reported-by: Igor Druzhinin Signed-off-by: Juergen Gross --- V2: - fix unlocking, add some related comments --- xen/common/sched/core.c | 39 ++++++++++++++++++++++++++++++++------- 1 file changed, 32 insertions(+), 7 deletions(-) diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c index 7e8e7d2c39..5d8343b327 100644 --- a/xen/common/sched/core.c +++ b/xen/common/sched/core.c @@ -2299,6 +2299,10 @@ void sched_context_switched(struct vcpu *vprev, struct vcpu *vnext) rcu_read_unlock(&sched_res_rculock); } +/* + * Switch to a new context or keep the current one running. + * On x86 it won't return, so it will drop the still held sched_res_rculock. + */ static void sched_context_switch(struct vcpu *vprev, struct vcpu *vnext, bool reset_idle_unit, s_time_t now) { @@ -2408,6 +2412,9 @@ static struct vcpu *sched_force_context_switch(struct vcpu *vprev, * zero do_schedule() is called and the rendezvous counter for leaving * context_switch() is set. All other members will wait until the counter is * becoming zero, dropping the schedule lock in between. + * Either returns the new unit to run, or NULL if no context switch is + * required or (on ARM) has already been performed. If NULL is returned + * sched_res_rculock has been dropped. */ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, spinlock_t **lock, int cpu, @@ -2415,7 +2422,8 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, { struct sched_unit *next; struct vcpu *v; - unsigned int gran = get_sched_res(cpu)->granularity; + struct sched_resource *sr = get_sched_res(cpu); + unsigned int gran = sr->granularity; if ( !--prev->rendezvous_in_cnt ) { @@ -2482,6 +2490,21 @@ static struct sched_unit *sched_wait_rendezvous_in(struct sched_unit *prev, atomic_set(&prev->next_task->rendezvous_out_cnt, 0); prev->rendezvous_in_cnt = 0; } + + /* + * Check for scheduling resourced switched. This happens when we are + * moved away from our cpupool and cpus are subject of the idle + * scheduler now. + */ + if ( unlikely(sr != get_sched_res(cpu)) ) + { + ASSERT(is_idle_unit(prev)); + atomic_set(&prev->next_task->rendezvous_out_cnt, 0); + prev->rendezvous_in_cnt = 0; + pcpu_schedule_unlock_irq(*lock, cpu); + rcu_read_unlock(&sched_res_rculock); + return NULL; + } } return prev->next_task; @@ -2567,11 +2590,11 @@ static void schedule(void) rcu_read_lock(&sched_res_rculock); + lock = pcpu_schedule_lock_irq(cpu); + sr = get_sched_res(cpu); gran = sr->granularity; - lock = pcpu_schedule_lock_irq(cpu); - if ( prev->rendezvous_in_cnt ) { /* @@ -3151,7 +3174,10 @@ int schedule_cpu_rm(unsigned int cpu) per_cpu(sched_res_idx, cpu_iter) = 0; if ( cpu_iter == cpu ) { - idle_vcpu[cpu_iter]->sched_unit->priv = NULL; + unit = idle_vcpu[cpu_iter]->sched_unit; + unit->priv = NULL; + atomic_set(&unit->next_task->rendezvous_out_cnt, 0); + unit->rendezvous_in_cnt = 0; } else { @@ -3182,6 +3208,8 @@ int schedule_cpu_rm(unsigned int cpu) } sr->scheduler = &sched_idle_ops; sr->sched_priv = NULL; + sr->granularity = 1; + sr->cpupool = NULL; smp_mb(); sr->schedule_lock = &sched_free_cpu_lock; @@ -3194,9 +3222,6 @@ int schedule_cpu_rm(unsigned int cpu) sched_free_udata(old_ops, vpriv_old); sched_free_pdata(old_ops, ppriv_old, cpu); - sr->granularity = 1; - sr->cpupool = NULL; - out: rcu_read_unlock(&sched_res_rculock); xfree(sr_new);