[v3,33/47] xen/sched: add a percpu resource index

Message ID	20190914085251.18816-34-jgross@suse.com (mailing list archive)
State	Superseded
Headers	show Return-Path: <SRS0=7ja4=XJ=lists.xenproject.org=xen-devel-bounces@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE41920717 From: Juergen Gross <jgross@suse.com> To: xen-devel@lists.xenproject.org Date: Sat, 14 Sep 2019 10:52:37 +0200 Message-Id: <20190914085251.18816-34-jgross@suse.com> In-Reply-To: <20190914085251.18816-1-jgross@suse.com> References: <20190914085251.18816-1-jgross@suse.com> Subject: [Xen-devel] [PATCH v3 33/47] xen/sched: add a percpu resource index Precedence: list Cc: Juergen Gross <jgross@suse.com>, George Dunlap <george.dunlap@eu.citrix.com>, Dario Faggioli <dfaggioli@suse.com> MIME-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xenproject.org Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>
Series	xen: add core scheduling support \| expand [v3,00/47] xen: add core scheduling support [v3,01/47] xen/sched: use new sched_unit instead of vcpu in scheduler interfaces [v3,02/47] xen/sched: move per-vcpu scheduler private data pointer to sched_unit [v3,03/47] xen/sched: build a linked list of struct sched_unit [v3,04/47] xen/sched: introduce struct sched_resource [v3,05/47] xen/sched: let pick_cpu return a scheduler resource [v3,06/47] xen/sched: switch schedule_data.curr to point at sched_unit [v3,07/47] xen/sched: move per cpu scheduler private data into struct sched_resource [v3,08/47] xen/sched: switch vcpu_schedule_lock to unit_schedule_lock [v3,09/47] xen/sched: move some per-vcpu items to struct sched_unit [v3,10/47] xen/sched: add scheduler helpers hiding vcpu [v3,11/47] xen/sched: rename scheduler related perf counters [v3,12/47] xen/sched: switch struct task_slice from vcpu to sched_unit [v3,13/47] xen/sched: add is_running indicator to struct sched_unit [v3,14/47] xen/sched: make null scheduler vcpu agnostic. [v3,15/47] xen/sched: make rt scheduler vcpu agnostic. [v3,16/47] xen/sched: make credit scheduler vcpu agnostic. [v3,17/47] xen/sched: make credit2 scheduler vcpu agnostic. [v3,18/47] xen/sched: make arinc653 scheduler vcpu agnostic. [v3,19/47] xen: add sched_unit_pause_nosync() and sched_unit_unpause() [v3,20/47] xen: let vcpu_create() select processor [v3,21/47] xen/sched: use sched_resource cpu instead smp_processor_id in schedulers [v3,22/47] xen/sched: switch schedule() from vcpus to sched_units [v3,23/47] xen/sched: switch sched_move_irqs() to take sched_unit as parameter [v3,24/47] xen: switch from for_each_vcpu() to for_each_sched_unit() [v3,25/47] xen/sched: add runstate counters to struct sched_unit [v3,26/47] xen/sched: Change vcpu_migrate_*() to operate on schedule unit [v3,27/47] xen/sched: move struct task_slice into struct sched_unit [v3,28/47] xen/sched: add code to sync scheduling of all vcpus of a sched unit [v3,29/47] xen/sched: introduce unit_runnable_state() [v3,30/47] xen/sched: add support for multiple vcpus per sched unit where missing [v3,31/47] xen/sched: modify cpupool_domain_cpumask() to be an unit mask [v3,32/47] xen/sched: support allocating multiple vcpus into one sched unit [v3,33/47] xen/sched: add a percpu resource index [v3,34/47] xen/sched: add fall back to idle vcpu when scheduling unit [v3,35/47] xen/sched: make vcpu_wake() and vcpu_sleep() core scheduling aware [v3,36/47] xen/sched: carve out freeing sched_unit memory into dedicated function [v3,37/47] xen/sched: move per-cpu variable scheduler to struct sched_resource [v3,38/47] xen/sched: move per-cpu variable cpupool to struct sched_resource [v3,39/47] xen/sched: reject switching smt on/off with core scheduling active [v3,40/47] xen/sched: prepare per-cpupool scheduling granularity [v3,41/47] xen/sched: split schedule_cpu_switch() [v3,42/47] xen/sched: protect scheduling resource via rcu [v3,43/47] xen/sched: support multiple cpus per scheduling resource [v3,44/47] xen/sched: support differing granularity in schedule_cpu_[add/rm]() [v3,45/47] xen/sched: support core scheduling for moving cpus to/from cpupools [v3,46/47] xen/sched: disable scheduling when entering ACPI deep sleep states [v3,47/47] xen/sched: add scheduling granularity enum

Message ID

20190914085251.18816-34-jgross@suse.com (mailing list archive)

State

Superseded

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org AE41920717
From: Juergen Gross <jgross@suse.com>
To: xen-devel@lists.xenproject.org
Date: Sat, 14 Sep 2019 10:52:37 +0200
Message-Id: <20190914085251.18816-34-jgross@suse.com>
In-Reply-To: <20190914085251.18816-1-jgross@suse.com>
References: <20190914085251.18816-1-jgross@suse.com>
Subject: [Xen-devel] [PATCH v3 33/47] xen/sched: add a percpu resource index
Precedence: list
Cc: Juergen Gross <jgross@suse.com>,
 George Dunlap <george.dunlap@eu.citrix.com>,
 Dario Faggioli <dfaggioli@suse.com>
MIME-Version: 1.0
Content-Type: text/plain; charset="utf-8"
Content-Transfer-Encoding: base64
Errors-To: xen-devel-bounces@lists.xenproject.org
Sender: "Xen-devel" <xen-devel-bounces@lists.xenproject.org>

Series

xen: add core scheduling support | expand

Commit Message

Jürgen Groß Sept. 14, 2019, 8:52 a.m. UTC

Add a percpu variable holding the index of the cpu in the current
sched_resource structure. This index is used to get the correct vcpu
of a sched_unit on a specific cpu.

For now this index will be zero for all cpus, but with core scheduling
it will be possible to have higher values, too.

Signed-off-by: Juergen Gross <jgross@suse.com>
---
RFC V2: new patch (carved out from RFC V1 patch 49)
---
 xen/common/schedule.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Jan Beulich Sept. 24, 2019, 10:05 a.m. UTC | #1

On 14.09.2019 10:52, Juergen Gross wrote:
> --- a/xen/common/schedule.c
> +++ b/xen/common/schedule.c
> @@ -71,6 +71,7 @@ static void poll_timer_fn(void *data);
>  /* This is global for now so that private implementations can reach it */
>  DEFINE_PER_CPU(struct scheduler *, scheduler);
>  DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);

It's of course not very helpful that this variable (right here) doesn't
ever get written to, and hence one can't judge where / how this is to
be done (without going look elsewhere). But I guess that calculation
can't be moved into this patch (accepting that it would always yield
zero for now)?

> @@ -135,6 +136,12 @@ static struct scheduler sched_idle_ops = {
>      .switch_sched   = sched_idle_switch_sched,
>  };
>  
> +static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,

const (on the parameter)?

> +                                               unsigned int cpu)
> +{
> +    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
> +}
> +
>  static inline struct scheduler *dom_scheduler(const struct domain *d)
>  {
>      if ( likely(d->cpupool != NULL) )
> @@ -2008,7 +2015,7 @@ static void sched_slave(void)
>  
>      pcpu_schedule_unlock_irq(lock, cpu);
>  
> -    sched_context_switch(vprev, next->vcpu_list, now);
> +    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
>  }

By the end of the series this will be

    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
                         is_idle_unit(next) && !is_idle_unit(prev), now);

clarifying (I think) that "next" can indeed be an idle unit. I'm having
difficulty seeing how can produce the correct result in all three cases
- all vCPU-s in the unit belong to a real domain
- all vCPU-s in the unit belong to the idle domain
- there's a mix of "real" and (filler) "idle" vCPU-s
My main question is what "next" is in the last of the 3 cases above.
Considering that scheduling happens in terms of units, I'd expect it to
be a real domain's unit, yet then I don't see how you'd get at the
correct (idle) vCPU when calling sched_unit2vcpu_cpu(next, cpu).

Jan

Jürgen Groß Sept. 24, 2019, 3:20 p.m. UTC | #2

On 24.09.19 12:05, Jan Beulich wrote:
> On 14.09.2019 10:52, Juergen Gross wrote:
>> --- a/xen/common/schedule.c
>> +++ b/xen/common/schedule.c
>> @@ -71,6 +71,7 @@ static void poll_timer_fn(void *data);
>>   /* This is global for now so that private implementations can reach it */
>>   DEFINE_PER_CPU(struct scheduler *, scheduler);
>>   DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
>> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
> 
> It's of course not very helpful that this variable (right here) doesn't
> ever get written to, and hence one can't judge where / how this is to
> be done (without going look elsewhere). But I guess that calculation
> can't be moved into this patch (accepting that it would always yield
> zero for now)?

Not easily. I can set sched_res_idx to 0 explicitly when initializing
a new cpu if you like that better.

> 
>> @@ -135,6 +136,12 @@ static struct scheduler sched_idle_ops = {
>>       .switch_sched   = sched_idle_switch_sched,
>>   };
>>   
>> +static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
> 
> const (on the parameter)?

Okay.

> 
>> +                                               unsigned int cpu)
>> +{
>> +    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
>> +}
>> +
>>   static inline struct scheduler *dom_scheduler(const struct domain *d)
>>   {
>>       if ( likely(d->cpupool != NULL) )
>> @@ -2008,7 +2015,7 @@ static void sched_slave(void)
>>   
>>       pcpu_schedule_unlock_irq(lock, cpu);
>>   
>> -    sched_context_switch(vprev, next->vcpu_list, now);
>> +    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
>>   }
> 
> By the end of the series this will be
> 
>      sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
>                           is_idle_unit(next) && !is_idle_unit(prev), now);
> 
> clarifying (I think) that "next" can indeed be an idle unit. I'm having
> difficulty seeing how can produce the correct result in all three cases
> - all vCPU-s in the unit belong to a real domain
> - all vCPU-s in the unit belong to the idle domain
> - there's a mix of "real" and (filler) "idle" vCPU-s
> My main question is what "next" is in the last of the 3 cases above.

next points to the real domain's unit.

> Considering that scheduling happens in terms of units, I'd expect it to
> be a real domain's unit, yet then I don't see how you'd get at the
> correct (idle) vCPU when calling sched_unit2vcpu_cpu(next, cpu).

What is the problem?

sched_unit2vcpu_cpu() will first get the real domain's vcpu for that cpu
(if exisiting) and replace it by the idle vcpu of that cpu if the real
domain's vcpu is either not existing or won't be running. The main trick
is that we use idle_vcpu[cpu] (which is existing today already).


Juergen

Jan Beulich Sept. 24, 2019, 3:29 p.m. UTC | #3

On 24.09.2019 17:20, Jürgen Groß wrote:
> On 24.09.19 12:05, Jan Beulich wrote:
>> On 14.09.2019 10:52, Juergen Gross wrote:
>>> --- a/xen/common/schedule.c
>>> +++ b/xen/common/schedule.c
>>> @@ -71,6 +71,7 @@ static void poll_timer_fn(void *data);
>>>   /* This is global for now so that private implementations can reach it */
>>>   DEFINE_PER_CPU(struct scheduler *, scheduler);
>>>   DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
>>> +static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
>>
>> It's of course not very helpful that this variable (right here) doesn't
>> ever get written to, and hence one can't judge where / how this is to
>> be done (without going look elsewhere). But I guess that calculation
>> can't be moved into this patch (accepting that it would always yield
>> zero for now)?
> 
> Not easily. I can set sched_res_idx to 0 explicitly when initializing
> a new cpu if you like that better.

No, not really. Then better leave it as is.

>>> @@ -2008,7 +2015,7 @@ static void sched_slave(void)
>>>   
>>>       pcpu_schedule_unlock_irq(lock, cpu);
>>>   
>>> -    sched_context_switch(vprev, next->vcpu_list, now);
>>> +    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
>>>   }
>>
>> By the end of the series this will be
>>
>>      sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu),
>>                           is_idle_unit(next) && !is_idle_unit(prev), now);
>>
>> clarifying (I think) that "next" can indeed be an idle unit. I'm having
>> difficulty seeing how can produce the correct result in all three cases
>> - all vCPU-s in the unit belong to a real domain
>> - all vCPU-s in the unit belong to the idle domain
>> - there's a mix of "real" and (filler) "idle" vCPU-s
>> My main question is what "next" is in the last of the 3 cases above.
> 
> next points to the real domain's unit.
> 
>> Considering that scheduling happens in terms of units, I'd expect it to
>> be a real domain's unit, yet then I don't see how you'd get at the
>> correct (idle) vCPU when calling sched_unit2vcpu_cpu(next, cpu).
> 
> What is the problem?
> 
> sched_unit2vcpu_cpu() will first get the real domain's vcpu for that cpu
> (if exisiting) and replace it by the idle vcpu of that cpu if the real
> domain's vcpu is either not existing or won't be running. The main trick
> is that we use idle_vcpu[cpu] (which is existing today already).

Yeah, I guess this has become more clear by looking at the next
patch. Without seeing the final shape sched_unit2vcpu_cpu() will
take (i.e. including the trick you talk about) it wasn't really
clear to me what is (supposed to be) going on here. (Makes me
wonder once again whether that function couldn't have taken its
final shape right away. But anyway.)

Jan

diff --git a/xen/common/schedule.c b/xen/common/schedule.c
index 5e34008ca8..246ad38c7d 100644
--- a/xen/common/schedule.c
+++ b/xen/common/schedule.c
@@ -71,6 +71,7 @@  static void poll_timer_fn(void *data);
 /* This is global for now so that private implementations can reach it */
 DEFINE_PER_CPU(struct scheduler *, scheduler);
 DEFINE_PER_CPU_READ_MOSTLY(struct sched_resource *, sched_res);
+static DEFINE_PER_CPU_READ_MOSTLY(unsigned int, sched_res_idx);
 
 /* Scratch space for cpumasks. */
 DEFINE_PER_CPU(cpumask_t, cpumask_scratch);
@@ -135,6 +136,12 @@  static struct scheduler sched_idle_ops = {
     .switch_sched   = sched_idle_switch_sched,
 };
 
+static inline struct vcpu *sched_unit2vcpu_cpu(struct sched_unit *unit,
+                                               unsigned int cpu)
+{
+    return unit->domain->vcpu[unit->unit_id + per_cpu(sched_res_idx, cpu)];
+}
+
 static inline struct scheduler *dom_scheduler(const struct domain *d)
 {
     if ( likely(d->cpupool != NULL) )
@@ -2008,7 +2015,7 @@  static void sched_slave(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    sched_context_switch(vprev, next->vcpu_list, now);
+    sched_context_switch(vprev, sched_unit2vcpu_cpu(next, cpu), now);
 }
 
 /*
@@ -2067,7 +2074,7 @@  static void schedule(void)
 
     pcpu_schedule_unlock_irq(lock, cpu);
 
-    vnext = next->vcpu_list;
+    vnext = sched_unit2vcpu_cpu(next, cpu);
     sched_context_switch(vprev, vnext, now);
 }

[v3,33/47] xen/sched: add a percpu resource index

Commit Message

Comments

Patch