From patchwork Mon May  6 06:56:33 2019
Content-Type: text/plain; charset="utf-8"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
X-Patchwork-Submitter: =?utf-8?b?SsO8cmdlbiBHcm/Dnw==?= <jgross@suse.com>
X-Patchwork-Id: 10930565
Return-Path: <xen-devel-bounces@lists.xenproject.org>
Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org
 [172.30.200.125])
	by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0E52D1390
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon,  6 May 2019 06:59:13 +0000 (UTC)
Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1])
	by mail.wl.linuxfoundation.org (Postfix) with ESMTP id F0C9728590
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon,  6 May 2019 06:59:12 +0000 (UTC)
Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486)
	id E50C928627; Mon,  6 May 2019 06:59:12 +0000 (UTC)
X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on
	pdx-wl-mail.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-5.2 required=2.0 tests=BAYES_00,MAILING_LIST_MULTI,
	RCVD_IN_DNSWL_MED autolearn=ham version=3.3.1
Received: from lists.xenproject.org (lists.xenproject.org [192.237.175.120])
	(using TLSv1.2 with cipher AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by mail.wl.linuxfoundation.org (Postfix) with ESMTPS id 3BDD528590
	for <patchwork-xen-devel@patchwork.kernel.org>;
 Mon,  6 May 2019 06:59:12 +0000 (UTC)
Received: from localhost ([127.0.0.1] helo=lists.xenproject.org)
	by lists.xenproject.org with esmtp (Exim 4.89)
	(envelope-from <xen-devel-bounces@lists.xenproject.org>)
	id 1hNXYy-0002jH-1e; Mon, 06 May 2019 06:57:24 +0000
Received: from all-amaz-eas1.inumbo.com ([34.197.232.57]
 helo=us1-amaz-eas2.inumbo.com)
 by lists.xenproject.org with esmtp (Exim 4.89)
 (envelope-from <SRS0=XZcu=TG=suse.com=jgross@srs-us1.protection.inumbo.net>)
 id 1hNXYl-0002FF-Bb
 for xen-devel@lists.xenproject.org; Mon, 06 May 2019 06:57:11 +0000
X-Inumbo-ID: 26a2e41c-6fcc-11e9-8c8c-ef3362a04400
Received: from mx1.suse.de (unknown [195.135.220.15])
 by us1-amaz-eas2.inumbo.com (Halon) with ESMTPS
 id 26a2e41c-6fcc-11e9-8c8c-ef3362a04400;
 Mon, 06 May 2019 06:57:02 +0000 (UTC)
X-Virus-Scanned: by amavisd-new at test-mx.suse.de
Received: from relay2.suse.de (unknown [195.135.220.254])
 by mx1.suse.de (Postfix) with ESMTP id 108B7AF3A;
 Mon,  6 May 2019 06:56:58 +0000 (UTC)
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Mon,  6 May 2019 08:56:33 +0200
Message-Id: <20190506065644.7415-35-jgross@suse.com>
X-Mailer: git-send-email 2.16.4
In-Reply-To: <20190506065644.7415-1-jgross@suse.com>
References: <20190506065644.7415-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH RFC V2 34/45] xen/sched: introduce
 item_runnable_state()
X-BeenThere: xen-devel@lists.xenproject.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: Xen developer discussion <xen-devel.lists.xenproject.org>
List-Unsubscribe: <https://lists.xenproject.org/mailman/options/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=unsubscribe>
List-Post: <mailto:xen-devel@lists.xenproject.org>
List-Help: <mailto:xen-devel-request@lists.xenproject.org?subject=help>
List-Subscribe: <https://lists.xenproject.org/mailman/listinfo/xen-devel>,
 <mailto:xen-devel-request@lists.xenproject.org?subject=subscribe>
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wei.liu2@citrix.com>,
 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>,
 Robert VanVossen <robert.vanvossen@dornerworks.com>,
 Tim Deegan <tim@xen.org>,
 Julien Grall <julien.grall@arm.com>,
 Josh Whitehead <josh.whitehead@dornerworks.com>,
 Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
X-Virus-Scanned: ClamAV using ClamSMTP

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule item have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule item is probed.

For this purpose introduce a new helper item_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++---------------------
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 14 +++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 57 insertions(+), 30 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 63ef64e4d4..97c0fdd797 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -151,6 +151,7 @@ struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
+        v->new_state = RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index e98e98116b..72d7063f6a 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -557,7 +557,7 @@ a653sched_do_schedule(
     if ( !((new_task != NULL)
            && (AITEM(new_task) != NULL)
            && AITEM(new_task)->awake
-           && item_runnable(new_task)) )
+           && item_runnable_state(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 064f88ab23..804c675e25 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1907,7 +1907,7 @@ static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_ITEM_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && item_runnable(item)
+         && item_runnable_state(item)
          && !is_idle_item(item)
          && runtime < prv->ratelimit )
     {
@@ -1952,33 +1952,36 @@ static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
 
-    snext = __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle ITEM context) overrides all else. */
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_ITEM(sched_idle_item(sched_cpu));
-        snext->pri = CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_ITEM_YIELD, &scurr->flags);
 
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable ITEM has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext = __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle ITEM context) overrides all else. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext = CSCHED_ITEM(sched_idle_item(sched_cpu));
+            snext->pri = CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable ITEM has already eaten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext, but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+
+    } while ( !item_runnable_state(snext->item) );
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index f1074be25d..f306821c6c 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3288,7 +3288,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && item_runnable(scurr->item) &&
+    if ( !yield && prv->ratelimit_us && item_runnable_state(scurr->item) &&
          (now - scurr->item->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3342,7 +3342,7 @@ runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( item_runnable(scurr->item) && !soft_aff_preempt )
+    if ( item_runnable_state(scurr->item) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_item(sched_idle_item(cpu));
@@ -3402,7 +3402,8 @@ runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || item_grab_budget(svc)) )
+             (!has_cap(svc) || item_grab_budget(svc)) &&
+             item_runnable_state(svc->item) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 1af396dcdb..1f59cbae75 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -784,7 +784,8 @@ static void null_schedule(const struct scheduler *ops, struct sched_item *prev,
         spin_unlock(&prv->waitq_lock);
     }
 
-    if ( unlikely(prev->next_task == NULL || !item_runnable(prev->next_task)) )
+    if ( unlikely(prev->next_task == NULL ||
+                  !item_runnable_state(prev->next_task)) )
         prev->next_task = sched_idle_item(sched_cpu);
 
     NULL_ITEM_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index c5e8b559f3..3900e2159d 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1101,12 +1101,18 @@ rt_schedule(const struct scheduler *ops, struct sched_item *curritem,
     else
     {
         snext = runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext == NULL )
             snext = rt_item(sched_idle_item(sched_cpu));
+        else if ( !item_runnable_state(snext->item) )
+        {
+            q_remove(snext);
+            snext = rt_item(sched_idle_item(sched_cpu));
+        }
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_item(curritem) &&
-             item_runnable(curritem) &&
+             item_runnable_state(curritem) &&
              scurr->cur_budget > 0 &&
              ( is_idle_item(snext->item) ||
                compare_item_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 9d65586cae..6ba6e70338 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -206,7 +206,7 @@ static inline void sched_item_runstate_change(struct sched_item *item,
     struct vcpu *v = item->vcpu;
 
     if ( running )
-        vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+        vcpu_runstate_change(v, v->new_state, new_entry_time);
     else
         vcpu_runstate_change(v,
             ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index 084b78d2b7..755b0f8f74 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -60,6 +60,20 @@ static inline bool item_runnable(const struct sched_item *item)
     return vcpu_runnable(item->vcpu);
 }
 
+static inline bool item_runnable_state(const struct sched_item *item)
+{
+    struct vcpu *v;
+    bool runnable;
+
+    v = item->vcpu;
+    runnable = vcpu_runnable(v);
+
+    v->new_state = runnable ? RUNSTATE_running
+                            : (v->pause_flags & VPF_blocked)
+                              ? RUNSTATE_blocked : RUNSTATE_offline;
+    return runnable;
+}
+
 static inline void sched_set_res(struct sched_item *item,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index 5224f0aa70..40d3def9f4 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@ struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    int              new_state;
 
     /* Has the FPU been initialised? */
     bool             fpu_initialised;