From patchwork Sat Sep 30 01:19:15 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Dario Faggioli X-Patchwork-Id: 9978963 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 752116037F for ; Sat, 30 Sep 2017 01:22:23 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5C9C529819 for ; Sat, 30 Sep 2017 01:22:23 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 511472989F; Sat, 30 Sep 2017 01:22:23 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-3.6 required=2.0 tests=BAYES_00,DKIM_SIGNED, RCVD_IN_DNSWL_MED,RCVD_IN_SORBS_SPAM,T_DKIM_INVALID autolearn=ham version=3.3.1 Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120]) (using TLSv1.2 with cipher AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 434F929819 for ; Sat, 30 Sep 2017 01:22:21 +0000 (UTC) Received: from localhost ([127.0.0.1] helo=lists.xenproject.org) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dy6R9-0001hi-66; Sat, 30 Sep 2017 01:19:23 +0000 Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xenproject.org with esmtp (Exim 4.84_2) (envelope-from ) id 1dy6R7-0001hc-E6 for xen-devel@lists.xenproject.org; Sat, 30 Sep 2017 01:19:21 +0000 Received: from [85.158.137.68] by server-2.bemta-3.messagelabs.com id 8F/BD-02041-811FEC95; Sat, 30 Sep 2017 01:19:20 +0000 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFvrBIsWRWlGSWpSXmKPExsXiVRvkoiv+8Vy kwfbl2hbft0xmcmD0OPzhCksAYxRrZl5SfkUCa8a9/oXMBRsSK/5sXMvewLjFt4uRi0NIYDqj xJwXH5lAHBaBqawS3b/usoA4EgIbWSX279rP1sXICeRkScz/tp4Vwk6T6P4NEucAsislPp83A QkLCahI3Ny+igli6k9GiXfTO8B6hQX0JI4c/cEOYYdJ/LrcywRiswkYSLzZsRdspoiAksS9VZ PB4swC1RKvOv+ygNgsAqoSC3rfMYPYvALeEm+nH2cEsUUF5CRWXm5hhYgLSpyc+YQF5B5mAU2 J9bv0IcbIS2x/O4d5AqPwLCRVsxCqZiGpWsDIvIpRozi1qCy1SNfIUi+pKDM9oyQ3MTNH19DA WC83tbg4MT01JzGpWC85P3cTIzDM6xkYGHcwNu31O8QoycGkJMq7/tW5SCG+pPyUyozE4oz4o tKc1OJDjBocHAITzs6dziTFkpefl6okwfvmPVCdYFFqempFWmYOMBJhSiU4eJREeNs/AKV5iw sSc4sz0yFSpxiNOY5tuvyHiWPfnlt/mITAJkmJ8zKClAqAlGaU5sENgiWIS4yyUsK8jAwMDEI 8BalFuZklqPKvGMU5GJWEeaVApvBk5pXA7XsFdAoT0CmTJ54BOaUkESEl1cA4sf7+99crZh+8 nxe73qaQMaDKJyBxUe2HFpviGaYHXdfNY0szLHtUPyOeYdeuTffC+x5e9z9X48r4/cLHT9v6C t7XbRX9wlBd4/biWekmpz/xP/ucXnNemfjsc59t/vwjlmd3lfe9TtvZcamhOrdA0Uc4d8Kb9u uXCnbeellYtvTx9LfLo2z2KLEUZyQaajEXFScCAFZ2uFwLAwAA X-Env-Sender: raistlin.df@gmail.com X-Msg-Ref: server-14.tower-31.messagelabs.com!1506734359!117083340!1 X-Originating-IP: [74.125.82.68] X-SpamReason: No, hits=0.5 required=7.0 tests=BODY_RANDOM_LONG X-StarScan-Received: X-StarScan-Version: 9.4.45; banners=-,-,- X-VirusChecked: Checked Received: (qmail 55255 invoked from network); 30 Sep 2017 01:19:19 -0000 Received: from mail-wm0-f68.google.com (HELO mail-wm0-f68.google.com) (74.125.82.68) by server-14.tower-31.messagelabs.com with AES128-GCM-SHA256 encrypted SMTP; 30 Sep 2017 01:19:19 -0000 Received: by mail-wm0-f68.google.com with SMTP id m72so1684456wmc.0 for ; Fri, 29 Sep 2017 18:19:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:subject:from:to:cc:date:message-id:user-agent:mime-version :content-transfer-encoding; bh=lLxB4+ZaUNDiFqnGL6HQprQbOICs7/Vd4/0rLJga3zI=; b=d38kXkSfCe/ruTcut7d32KuJ9Fs4Ylf1yV2y1SzWjycToHqPiLFG3zw/sgmR72dckR BgtPwh0SNGbOa2KwDvecXXajz9gc6pE0BPw5ph5EBFHWY02Tbs6rbZeVS3lbQDDGUfyK iJ4lPvG8lfMpq96qGRNIXKm8Ih2Sji8CW+KSziDH/GXolfuJLrljzm2bggaWnESWu3Wx ipDOdtEgXklSHBd+L7d85vZcZUuoEIoFdIrOjg8Sall3aPzcwDqlTP9mGkogMjGrUkNs sFHd2WNA5xuDDotrit22LznScjRzgFZ2PW/iGzR7mw2UUrPXXVQD9WD/O6rLTbGkJYR3 WfPA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:subject:from:to:cc:date:message-id :user-agent:mime-version:content-transfer-encoding; bh=lLxB4+ZaUNDiFqnGL6HQprQbOICs7/Vd4/0rLJga3zI=; b=NBWpZnkqaIRRNx+fERkHijhtxRMf0AQG85Q9PZc9ZHnpPBK5AVCRRySVio9GtvJm4e B5Fdp7qccD+25PjLQNZtRK1vRjSMldwIKVlIcYM+lqbuYa24WxF7lhkTiIoBCE/tVZ9m zY1e6nuFg+Yr+p1Y85isCvMGmK4MVt1f9h9TVwFQ73Sbmy8psZAZXPSKdM00TT9Vcdij yGamZNwoNSJQwTjVJYDFwPhRvtRJquBUd/O5SO+6M0mYB4wB61jgPvECdbfqJ4Kbnqav g6ahYEn7ACa5AewLRvI9E0iUwyY0hNW6UDrrgCpFYVvVmJw5DyO0//6Rb7XIW4lP/NLw 1HPA== X-Gm-Message-State: AMCzsaUG8nsD9uJO3za/qw3vubvQ+JkzFGyxa8eDJQ7WjWhT/d0pv0ig GaB8/coj9wu9fePHMXIJKUHdGSGD X-Google-Smtp-Source: AOwi7QB40jKHcsG0UPpmUoq5POsJALKiKn6Nk+Fjv9IZBO0pOkHODk22O5RqpesvHK4zIkWl4kA5Wg== X-Received: by 10.28.230.216 with SMTP id e85mr3826457wmi.86.1506734358891; Fri, 29 Sep 2017 18:19:18 -0700 (PDT) Received: from Solace.fritz.box ([80.66.223.52]) by smtp.gmail.com with ESMTPSA id e77sm4348478wmi.29.2017.09.29.18.19.16 (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 29 Sep 2017 18:19:17 -0700 (PDT) From: Dario Faggioli To: xen-devel@lists.xenproject.org Date: Sat, 30 Sep 2017 03:19:15 +0200 Message-ID: <150673435563.13539.6118066928384180176.stgit@Solace.fritz.box> User-Agent: StGit/0.17.1-dirty MIME-Version: 1.0 Cc: Julien Grall , George Dunlap , Anshul Makkar Subject: [Xen-devel] [PATCH] xen: Credit2: stop hitting a BU_ON() when creating a Credit2 cpupool X-BeenThere: xen-devel@lists.xen.org X-Mailman-Version: 2.1.18 Precedence: list List-Id: Xen developer discussion List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" X-Virus-Scanned: ClamAV using ClamSMTP This fixes hitting this BUG_ON(): xen/common/sched_credit2.c:3452 rqd = c2rqd(ops, cpu); BUG_ON(!cpumask_test_cpu(cpu, &rqd->active)); when Credit2 is used as default scheduler, and a cpupool, also using Credit2, is created. The bug hit because c2rqd() accesses rqd[per_cpu(runq_map,cpu)]->active when the value of runq_map for this cpu is -1 (and so, we basically access random memory). Bug was introduced in a2c4e5ab ("xen: credit2: make the cpu to runqueue map per-cpu"). In fact, in that commit the CPU to runq map became a global data structure (instead of a per-scheduler array, as it was before that commit). So, when a CPU is deassigned from a scheduler, its corresponding value in the runq map is set to -1. And this makes things explode, in case the CPU was also being assigned to another scheduler. Basically, when moving a CPU from a scheduler (cpupool) to another, we call: - csched2_init_pdata(new_ops, cpu); | --> per_cpu(runq_map,cpu)= _new_runq_id_ ; - csched2_deinit_pdata(old_ops, cpu); | --> per_cpu(runq_map,cpu)= -1; The solution is to put the information of to which runqueue a CPU belongs in a 'csched2_pcpu' data structure (as, e.g., Credit does, for all the per-pcpu pieces of information that it needs to keep), and use the sched_priv field of the per_cpu(schedule_data) structure, to reach out to it. This solution has the following properties: - it prevents Xen from crashing; :-) - it's more in line with how the schedule.c<-->sched_foo.c interface has been designed, and is supposed to be used (that sched_priv field, is specifically meant to be used for this kind of data); - the improvement brought by a2c4e5ab is retained. In fact, the fact that each instance of Credit2 was (needlessly) keeping a large array of int-s, for CPU to runq mapping, which that commit made unnecessary, is *not* reintroduced. Actually, we use even less memory, as instead of a per-cpu set of int-s (which means there were one even for the CPUs which don't belong to a Credit2 instance), we allocate a 1-field-only (an int) struct, only for the CPUs that belong to a Credit2 instance. Note that this, also: * allows the init_pdata() internal helper function to return void, simplifying its callers (and making it more similar to its counterparts in other schedulers); * enables more extensive use of the c2rqd() helper, making the code more uniform and easier to read; * requires the hook csched2_{alloc,free}_pdata to be defined for Credit2; * requires the assignment of new values to 'scheduler' and 'schedule_data.sched_priv' per-cpu variables to be moved up a bit, in csched2_switch_sched(). Signed-off-by: Dario Faggioli --- Cc: George Dunlap Cc: Anshul Makkar Cc: Julien Grall --- Release-wise, Julien, considering that: - we're not even freezed yet, - IAC, this is a bugfix, I'd say it should go in. --- xen/common/sched_credit2.c | 133 +++++++++++++++++++++++++++++--------------- 1 file changed, 89 insertions(+), 44 deletions(-) diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c index 18f39ca..4a08a9f 100644 --- a/xen/common/sched_credit2.c +++ b/xen/common/sched_credit2.c @@ -504,9 +504,11 @@ struct csched2_private { * Physical CPU * * The only per-pCPU information we need to maintain is of which runqueue - * each CPU is part of. + * each CPU is part of (-1 to mean 'to none'). */ -static DEFINE_PER_CPU(int, runq_map); +struct csched2_pcpu { + int runq_id; +}; /* * Virtual CPU @@ -565,6 +567,11 @@ static inline struct csched2_private *csched2_priv(const struct scheduler *ops) return ops->sched_data; } +static inline struct csched2_pcpu *csched2_pcpu(unsigned int cpu) +{ + return per_cpu(schedule_data, cpu).sched_priv; +} + static inline struct csched2_vcpu *csched2_vcpu(const struct vcpu *v) { return v->sched_priv; @@ -578,7 +585,7 @@ static inline struct csched2_dom *csched2_dom(const struct domain *d) /* CPU to runq_id macro */ static inline int c2r(unsigned int cpu) { - return per_cpu(runq_map, cpu); + return csched2_pcpu(cpu)->runq_id; } /* CPU to runqueue struct macro */ @@ -3769,31 +3776,33 @@ csched2_dump(const struct scheduler *ops) #undef cpustr } -/* Returns the ID of the runqueue the cpu is assigned to. */ -static unsigned -init_pdata(struct csched2_private *prv, unsigned int cpu) +static void +init_pdata(struct csched2_private *prv, + struct csched2_pcpu *spc, + unsigned int cpu) { - unsigned rqi; struct csched2_runqueue_data *rqd; ASSERT(rw_is_write_locked(&prv->lock)); + /* The CPU must not have been initialized already, in this scheduler. */ ASSERT(!cpumask_test_cpu(cpu, &prv->initialized)); + /* The per-CPU data needs to be allocated, but STILL uninitialized. */ + ASSERT(spc && spc->runq_id == -1); + /* alloc_pdata must have been called already. */ + ASSERT(per_cpu(schedule_data, cpu).sched_priv == spc); /* Figure out which runqueue to put it in */ - rqi = cpu_to_runqueue(prv, cpu); + spc->runq_id = cpu_to_runqueue(prv, cpu); - rqd = prv->rqd + rqi; + rqd = prv->rqd + spc->runq_id; - printk(XENLOG_INFO "Adding cpu %d to runqueue %d\n", cpu, rqi); - if ( ! cpumask_test_cpu(rqi, &prv->active_queues) ) + printk(XENLOG_INFO "Adding cpu %d to runqueue %d\n", cpu, spc->runq_id); + if ( !cpumask_test_cpu(spc->runq_id, &prv->active_queues) ) { printk(XENLOG_INFO " First cpu on runqueue, activating\n"); - activate_runqueue(prv, rqi); + activate_runqueue(prv, spc->runq_id); } - /* Set the runqueue map */ - per_cpu(runq_map, cpu) = rqi; - __cpumask_set_cpu(cpu, &rqd->idle); __cpumask_set_cpu(cpu, &rqd->active); __cpumask_set_cpu(cpu, &prv->initialized); @@ -3801,8 +3810,21 @@ init_pdata(struct csched2_private *prv, unsigned int cpu) if ( cpumask_weight(&rqd->active) == 1 ) rqd->pick_bias = cpu; +} - return rqi; +static void * +csched2_alloc_pdata(const struct scheduler *ops, int cpu) +{ + struct csched2_pcpu *spc; + + spc = xzalloc(struct csched2_pcpu); + if ( spc == NULL ) + return ERR_PTR(-ENOMEM); + + /* Initial situation: not part of any runqueue. */ + spc->runq_id = -1; + + return spc; } static void @@ -3811,20 +3833,14 @@ csched2_init_pdata(const struct scheduler *ops, void *pdata, int cpu) struct csched2_private *prv = csched2_priv(ops); spinlock_t *old_lock; unsigned long flags; - unsigned rqi; - - /* - * pdata contains what alloc_pdata returned. But since we don't (need to) - * implement alloc_pdata, either that's NULL, or something is very wrong! - */ - ASSERT(!pdata); write_lock_irqsave(&prv->lock, flags); old_lock = pcpu_schedule_lock(cpu); - rqi = init_pdata(prv, cpu); + init_pdata(prv, pdata, cpu); + /* Move the scheduler lock to the new runq lock. */ - per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock; + per_cpu(schedule_data, cpu).schedule_lock = &c2rqd(ops, cpu)->lock; /* _Not_ pcpu_schedule_unlock(): schedule_lock may have changed! */ spin_unlock(old_lock); @@ -3838,9 +3854,8 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, { struct csched2_private *prv = csched2_priv(new_ops); struct csched2_vcpu *svc = vdata; - unsigned rqi; - ASSERT(!pdata && svc && is_idle_vcpu(svc->vcpu)); + ASSERT(pdata && svc && is_idle_vcpu(svc->vcpu)); /* * We own one runqueue lock already (from schedule_cpu_switch()). This @@ -3855,7 +3870,10 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, idle_vcpu[cpu]->sched_priv = vdata; - rqi = init_pdata(prv, cpu); + per_cpu(scheduler, cpu) = new_ops; + per_cpu(schedule_data, cpu).sched_priv = pdata; + + init_pdata(prv, pdata, cpu); /* * Now that we know what runqueue we'll go in, double check what's said @@ -3863,10 +3881,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, * this scheduler, and so it's safe to have taken it /before/ our * private global lock. */ - ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &prv->rqd[rqi].lock); - - per_cpu(scheduler, cpu) = new_ops; - per_cpu(schedule_data, cpu).sched_priv = NULL; /* no pdata */ + ASSERT(per_cpu(schedule_data, cpu).schedule_lock != &c2rqd(new_ops, cpu)->lock); /* * (Re?)route the lock to the per pCPU lock as /last/ thing. In fact, @@ -3874,7 +3889,7 @@ csched2_switch_sched(struct scheduler *new_ops, unsigned int cpu, * taking it, find all the initializations we've done above in place. */ smp_mb(); - per_cpu(schedule_data, cpu).schedule_lock = &prv->rqd[rqi].lock; + per_cpu(schedule_data, cpu).schedule_lock = &c2rqd(new_ops, cpu)->lock; write_unlock(&prv->lock); } @@ -3885,25 +3900,27 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) unsigned long flags; struct csched2_private *prv = csched2_priv(ops); struct csched2_runqueue_data *rqd; - int rqi; + struct csched2_pcpu *spc = pcpu; write_lock_irqsave(&prv->lock, flags); + /* Both alloc_pdata and init_pdata must have been called on this CPU. */ + ASSERT(spc && spc->runq_id != -1 && + cpumask_test_cpu(cpu, &prv->initialized)); + /* - * alloc_pdata is not implemented, so pcpu must be NULL. On the other - * hand, init_pdata must have been called for this pCPU. + * NB. Since per_cpu(schedule_data,cpu).sched_priv may have been set to + * something that is different than pcpu already, we **can't** use the + * csched2_pcpu(), c2r() and c2rqd() helpers in here! */ - ASSERT(!pcpu && cpumask_test_cpu(cpu, &prv->initialized)); - - /* Find the old runqueue and remove this cpu from it */ - rqi = per_cpu(runq_map, cpu); - rqd = prv->rqd + rqi; + /* Find the old runqueue and remove this cpu from it */ + rqd = prv->rqd + spc->runq_id; /* No need to save IRQs here, they're already disabled */ spin_lock(&rqd->lock); - printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, rqi); + printk(XENLOG_INFO "Removing cpu %d from runqueue %d\n", cpu, spc->runq_id); __cpumask_clear_cpu(cpu, &rqd->idle); __cpumask_clear_cpu(cpu, &rqd->smt_idle); @@ -3912,12 +3929,12 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) if ( cpumask_empty(&rqd->active) ) { printk(XENLOG_INFO " No cpus left on runqueue, disabling\n"); - deactivate_runqueue(prv, rqi); + deactivate_runqueue(prv, spc->runq_id); } else if ( rqd->pick_bias == cpu ) rqd->pick_bias = cpumask_first(&rqd->active); - per_cpu(runq_map, cpu) = -1; + spc->runq_id = -1; spin_unlock(&rqd->lock); @@ -3928,6 +3945,32 @@ csched2_deinit_pdata(const struct scheduler *ops, void *pcpu, int cpu) return; } +static void +csched2_free_pdata(const struct scheduler *ops, void *pcpu, int cpu) +{ + struct csched2_private *prv = csched2_priv(ops); + + /* + * We want to be sure that either init_pdata has never been called, for + * this CPU in this scheduler, or that deinit_pdata has been called on + * it already. + */ + ASSERT(!cpumask_test_cpu(cpu, &prv->initialized)); + + /* + * pcpu can be NULL, in case we are coming from CPU_UP_CANCELLED + * (because bringing up the CPU failed). And if it is not, it must, + * for the same reasons explained in the above comment, be invalid. + */ + if ( pcpu != NULL ) + { + struct csched2_pcpu *spc = pcpu; + + ASSERT(spc->runq_id == -1); + xfree(spc); + } +} + static int csched2_init(struct scheduler *ops) { @@ -4045,8 +4088,10 @@ static const struct scheduler sched_credit2_def = { .deinit = csched2_deinit, .alloc_vdata = csched2_alloc_vdata, .free_vdata = csched2_free_vdata, + .alloc_pdata = csched2_alloc_pdata, .init_pdata = csched2_init_pdata, .deinit_pdata = csched2_deinit_pdata, + .free_pdata = csched2_free_pdata, .switch_sched = csched2_switch_sched, .alloc_domdata = csched2_alloc_domdata, .free_domdata = csched2_free_domdata,