[v4,29/46] xen/sched: introduce unit_runnable_state()

Message ID	20190927070050.12405-30-jgross@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=1d2Y=XW=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6AC5E20872 From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Fri, 27 Sep 2019 09:00:33 +0200 Message-Id: <20190927070050.12405-30-jgross@suse.com> In-Reply-To: <20190927070050.12405-1-jgross@suse.com> References: <20190927070050.12405-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v4 29/46] xen/sched: introduce unit_runnable_state() Precedence: list Cc: Juergen Gross <jgross@suse.com>, Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>, Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>, George Dunlap <George.Dunlap@eu.citrix.com>, Andrew Cooper <andrew.cooper3@citrix.com>, Ian Jackson <ian.jackson@eu.citrix.com>, Robert VanVossen <robert.vanvossen@dornerworks.com>, Tim Deegan <tim@xen.org>, Julien Grall <julien.grall@arm.com>, Josh Whitehead <josh.whitehead@dornerworks.com>, Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>, Dario Faggioli <dfaggioli@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen: add core scheduling support \| expand [v4,00/46] xen: add core scheduling support [v4,01/46] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces [v4,02/46] xen/sched: move per-vcpu scheduler private data pointer to sched_unit [v4,03/46] xen/sched: build a linked list of struct sched_unit [v4,04/46] xen/sched: introduce struct sched_resource [v4,05/46] xen/sched: let pick_cpu return a scheduler resource [v4,06/46] xen/sched: switch schedule_data.curr to point at sched_unit [v4,07/46] xen/sched: move per cpu scheduler private data into struct sched_resource [v4,08/46] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock [v4,09/46] xen/sched: move some per-vcpu items to struct sched_unit [v4,10/46] xen/sched: add scheduler helpers hiding vcpu [v4,11/46] xen/sched: rename scheduler related perf counters [v4,12/46] xen/sched: switch struct task_slice from vcpu to sched_unit [v4,13/46] xen/sched: add is_running indicator to struct sched_unit [v4,14/46] xen/sched: make null scheduler vcpu agnostic. [v4,15/46] xen/sched: make rt scheduler vcpu agnostic. [v4,16/46] xen/sched: make credit scheduler vcpu agnostic. [v4,17/46] xen/sched: make credit2 scheduler vcpu agnostic. [v4,18/46] xen/sched: make arinc653 scheduler vcpu agnostic. [v4,19/46] xen: add sched_unit_pause_nosync() and sched_unit_unpause() [v4,20/46] xen: let vcpu_create() select processor [v4,21/46] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers [v4,22/46] xen/sched: switch schedule() from vcpus to sched_units [v4,23/46] xen/sched: switch sched_move_irqs() to take sched_unit as parameter [v4,24/46] xen: switch from for_each_vcpu() to for_each_sched_unit() [v4,25/46] xen/sched: add runstate counters to struct sched_unit [v4,26/46] xen/sched: Change vcpu_migrate_*() to operate on schedule unit [v4,27/46] xen/sched: move struct task_slice into struct sched_unit [v4,28/46] xen/sched: add code to sync scheduling of all vcpus of a sched unit [v4,29/46] xen/sched: introduce unit_runnable_state() [v4,30/46] xen/sched: add support for multiple vcpus per sched unit where missing [v4,31/46] xen/sched: modify cpupool_domain_cpumask() to be an unit mask [v4,32/46] xen/sched: support allocating multiple vcpus into one sched unit [v4,33/46] xen/sched: add a percpu resource index [v4,34/46] xen/sched: add fall back to idle vcpu when scheduling unit [v4,35/46] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware [v4,36/46] xen/sched: move per-cpu variable scheduler to struct sched_resource [v4,37/46] xen/sched: move per-cpu variable cpupool to struct sched_resource [v4,38/46] xen/sched: reject switching smt on/off with core scheduling active [v4,39/46] xen/sched: prepare per-cpupool scheduling granularity [v4,40/46] xen/sched: split schedule_cpu_switch() [v4,41/46] xen/sched: protect scheduling resource via rcu [v4,42/46] xen/sched: support multiple cpus per scheduling resource [v4,43/46] xen/sched: support differing granularity in schedule_cpu_[add/rm]() [v4,44/46] xen/sched: support core scheduling for moving cpus to/from cpupools [v4,45/46] xen/sched: disable scheduling when entering ACPI deep sleep states [v4,46/46] xen/sched: add scheduling granularity enum

Message ID

20190927070050.12405-30-jgross@suse.com (mailing list archive)

State

Superseded

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6AC5E20872
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Fri, 27 Sep 2019 09:00:33 +0200
Message-Id: <20190927070050.12405-30-jgross@suse.com>
In-Reply-To: <20190927070050.12405-1-jgross@suse.com>
References: <20190927070050.12405-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v4 29/46] xen/sched: introduce
 unit_runnable_state()
Precedence: list
Cc: Juergen Gross <jgross@suse.com>,
 Stefano Stabellini <sstabellini@kernel.org>, Wei Liu <wl@xen.org>,
 Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
 George Dunlap <George.Dunlap@eu.citrix.com>,
 Andrew Cooper <andrew.cooper3@citrix.com>,
 Ian Jackson <ian.jackson@eu.citrix.com>,
 Robert VanVossen <robert.vanvossen@dornerworks.com>,
 Tim Deegan <tim@xen.org>,
 Julien Grall <julien.grall@arm.com>,
 Josh Whitehead <josh.whitehead@dornerworks.com>,
 Meng Xu <mengxu@cis.upenn.edu>, Jan Beulich <jbeulich@suse.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

xen: add core scheduling support | expand

Commit Message

Jürgen Groß Sept. 27, 2019, 7 a.m. UTC

Today the vcpu runstate of a new scheduled vcpu is always set to
"running" even if at that time vcpu_runnable() is already returning
false due to a race (e.g. with pausing the vcpu).

With core scheduling this can no longer work as not all vcpus of a
schedule unit have to be "running" when being scheduled. So the vcpu's
new runstate has to be selected at the same time as the runnability of
the related schedule unit is probed.

For this purpose introduce a new helper unit_runnable_state() which
will save the new runstate of all tested vcpus in a new field of the
vcpu struct.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2:
- new patch
V3:
- add vcpu loop to unit_runnable_state() right now instead of doing
  so in next patch (Jan Beulich, Dario Faggioli)
- make new_state unsigned int (Jan Beulich)
V4:
- add comment explaining unit_runnable_state() (Jan Beulich)
---
 xen/common/domain.c         |  1 +
 xen/common/sched_arinc653.c |  2 +-
 xen/common/sched_credit.c   | 49 ++++++++++++++++++++++++---------------------
 xen/common/sched_credit2.c  |  7 ++++---
 xen/common/sched_null.c     |  3 ++-
 xen/common/sched_rt.c       |  8 +++++++-
 xen/common/schedule.c       |  2 +-
 xen/include/xen/sched-if.h  | 30 +++++++++++++++++++++++++++
 xen/include/xen/sched.h     |  1 +
 9 files changed, 73 insertions(+), 30 deletions(-)

diff --git a/xen/common/domain.c b/xen/common/domain.c
index 699e63361b..466b9c1b73 100644
--- a/xen/common/domain.c
+++ b/xen/common/domain.c
@@ -157,6 +157,7 @@  struct vcpu *vcpu_create(struct domain *d, unsigned int vcpu_id)
     if ( is_idle_domain(d) )
     {
         v->runstate.state = RUNSTATE_running;
+        v->new_state = RUNSTATE_running;
     }
     else
     {
diff --git a/xen/common/sched_arinc653.c b/xen/common/sched_arinc653.c
index fcf81db19a..dd5876eacd 100644
--- a/xen/common/sched_arinc653.c
+++ b/xen/common/sched_arinc653.c
@@ -563,7 +563,7 @@  a653sched_do_schedule(
     if ( !((new_task != NULL)
            && (AUNIT(new_task) != NULL)
            && AUNIT(new_task)->awake
-           && unit_runnable(new_task)) )
+           && unit_runnable_state(new_task)) )
         new_task = IDLETASK(cpu);
     BUG_ON(new_task == NULL);
 
diff --git a/xen/common/sched_credit.c b/xen/common/sched_credit.c
index 299eff21ac..00beac3ea4 100644
--- a/xen/common/sched_credit.c
+++ b/xen/common/sched_credit.c
@@ -1894,7 +1894,7 @@  static void csched_schedule(
     if ( !test_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags)
          && !tasklet_work_scheduled
          && prv->ratelimit
-         && unit_runnable(unit)
+         && unit_runnable_state(unit)
          && !is_idle_unit(unit)
          && runtime < prv->ratelimit )
     {
@@ -1939,33 +1939,36 @@  static void csched_schedule(
         dec_nr_runnable(sched_cpu);
     }
 
-    snext = __runq_elem(runq->next);
-
-    /* Tasklet work (which runs in idle UNIT context) overrides all else. */
-    if ( tasklet_work_scheduled )
-    {
-        TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
-        snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
-        snext->pri = CSCHED_PRI_TS_BOOST;
-    }
-
     /*
      * Clear YIELD flag before scheduling out
      */
     clear_bit(CSCHED_FLAG_UNIT_YIELD, &scurr->flags);
 
-    /*
-     * SMP Load balance:
-     *
-     * If the next highest priority local runnable UNIT has already eaten
-     * through its credits, look on other PCPUs to see if we have more
-     * urgent work... If not, csched_load_balance() will return snext, but
-     * already removed from the runq.
-     */
-    if ( snext->pri > CSCHED_PRI_TS_OVER )
-        __runq_remove(snext);
-    else
-        snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+    do {
+        snext = __runq_elem(runq->next);
+
+        /* Tasklet work (which runs in idle UNIT context) overrides all else. */
+        if ( tasklet_work_scheduled )
+        {
+            TRACE_0D(TRC_CSCHED_SCHED_TASKLET);
+            snext = CSCHED_UNIT(sched_idle_unit(sched_cpu));
+            snext->pri = CSCHED_PRI_TS_BOOST;
+        }
+
+        /*
+         * SMP Load balance:
+         *
+         * If the next highest priority local runnable UNIT has already eaten
+         * through its credits, look on other PCPUs to see if we have more
+         * urgent work... If not, csched_load_balance() will return snext, but
+         * already removed from the runq.
+         */
+        if ( snext->pri > CSCHED_PRI_TS_OVER )
+            __runq_remove(snext);
+        else
+            snext = csched_load_balance(prv, sched_cpu, snext, &migrated);
+
+    } while ( !unit_runnable_state(snext->unit) );
 
     /*
      * Update idlers mask if necessary. When we're idling, other CPUs
diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index 87d142bbe4..0e29e56d5a 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -3291,7 +3291,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
      * In fact, it may be the case that scurr is about to spin, and there's
      * no point forcing it to do so until rate limiting expires.
      */
-    if ( !yield && prv->ratelimit_us && unit_runnable(scurr->unit) &&
+    if ( !yield && prv->ratelimit_us && unit_runnable_state(scurr->unit) &&
          (now - scurr->unit->state_entry_time) < MICROSECS(prv->ratelimit_us) )
     {
         if ( unlikely(tb_init_done) )
@@ -3345,7 +3345,7 @@  runq_candidate(struct csched2_runqueue_data *rqd,
      *
      * Of course, we also default to idle also if scurr is not runnable.
      */
-    if ( unit_runnable(scurr->unit) && !soft_aff_preempt )
+    if ( unit_runnable_state(scurr->unit) && !soft_aff_preempt )
         snext = scurr;
     else
         snext = csched2_unit(sched_idle_unit(cpu));
@@ -3405,7 +3405,8 @@  runq_candidate(struct csched2_runqueue_data *rqd,
          * some budget, then choose it.
          */
         if ( (yield || svc->credit > snext->credit) &&
-             (!has_cap(svc) || unit_grab_budget(svc)) )
+             (!has_cap(svc) || unit_grab_budget(svc)) &&
+             unit_runnable_state(svc->unit) )
             snext = svc;
 
         /* In any case, if we got this far, break. */
diff --git a/xen/common/sched_null.c b/xen/common/sched_null.c
index 80a7d45935..3dde1dcd00 100644
--- a/xen/common/sched_null.c
+++ b/xen/common/sched_null.c
@@ -864,7 +864,8 @@  static void null_schedule(const struct scheduler *ops, struct sched_unit *prev,
             cpumask_set_cpu(sched_cpu, &prv->cpus_free);
     }
 
-    if ( unlikely(prev->next_task == NULL || !unit_runnable(prev->next_task)) )
+    if ( unlikely(prev->next_task == NULL ||
+                  !unit_runnable_state(prev->next_task)) )
         prev->next_task = sched_idle_unit(sched_cpu);
 
     NULL_UNIT_CHECK(prev->next_task);
diff --git a/xen/common/sched_rt.c b/xen/common/sched_rt.c
index cfd7d334fa..fd882f2ca4 100644
--- a/xen/common/sched_rt.c
+++ b/xen/common/sched_rt.c
@@ -1092,12 +1092,18 @@  rt_schedule(const struct scheduler *ops, struct sched_unit *currunit,
     else
     {
         snext = runq_pick(ops, cpumask_of(sched_cpu));
+
         if ( snext == NULL )
             snext = rt_unit(sched_idle_unit(sched_cpu));
+        else if ( !unit_runnable_state(snext->unit) )
+        {
+            q_remove(snext);
+            snext = rt_unit(sched_idle_unit(sched_cpu));
+        }
 
         /* if scurr has higher priority and budget, still pick scurr */
         if ( !is_idle_unit(currunit) &&
-             unit_runnable(currunit) &&
+             unit_runnable_state(currunit) &&
              scurr->cur_budget > 0 &&
              ( is_idle_unit(snext->unit) ||
                compare_unit_priority(scurr, snext) > 0 ) )
diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 180a225494..4f7f195915 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -278,7 +278,7 @@  static inline void sched_unit_runstate_change(struct sched_unit *unit,
     for_each_sched_unit_vcpu ( unit, v )
     {
         if ( running )
-            vcpu_runstate_change(v, RUNSTATE_running, new_entry_time);
+            vcpu_runstate_change(v, v->new_state, new_entry_time);
         else
             vcpu_runstate_change(v,
                 ((v->pause_flags & VPF_blocked) ? RUNSTATE_blocked :
diff --git a/xen/include/xen/sched-if.h b/xen/include/xen/sched-if.h
index c65dfa943b..7e568a9d9f 100644
--- a/xen/include/xen/sched-if.h
+++ b/xen/include/xen/sched-if.h
@@ -93,6 +93,36 @@  static inline bool unit_runnable(const struct sched_unit *unit)
     return false;
 }
 
+/*
+ * Returns whether a sched_unit is runnable and sets new_state for each of its
+ * vcpus. It is mandatory to determine the new runstate for all vcpus of a unit
+ * without dropping the schedule lock (which happens when synchronizing the
+ * context switch of the vcpus of a unit) in order to avoid races with e.g.
+ * vcpu_sleep().
+ */
+static inline bool unit_runnable_state(const struct sched_unit *unit)
+{
+    struct vcpu *v;
+    bool runnable, ret = false;
+
+    if ( is_idle_unit(unit) )
+        return true;
+
+    for_each_sched_unit_vcpu ( unit, v )
+    {
+        runnable = vcpu_runnable(v);
+
+        v->new_state = runnable ? RUNSTATE_running
+                                : (v->pause_flags & VPF_blocked)
+                                  ? RUNSTATE_blocked : RUNSTATE_offline;
+
+        if ( runnable )
+            ret = true;
+    }
+
+    return ret;
+}
+
 static inline void sched_set_res(struct sched_unit *unit,
                                  struct sched_resource *res)
 {
diff --git a/xen/include/xen/sched.h b/xen/include/xen/sched.h
index c770ab4aa0..12f00cd78d 100644
--- a/xen/include/xen/sched.h
+++ b/xen/include/xen/sched.h
@@ -174,6 +174,7 @@  struct vcpu
         XEN_GUEST_HANDLE(vcpu_runstate_info_compat_t) compat;
     } runstate_guest; /* guest address */
 #endif
+    unsigned int     new_state;
 
     /* Has the FPU been initialised? */
     bool             fpu_initialised;

[v4,29/46] xen/sched: introduce unit_runnable_state()

Commit Message

Patch