diff mbox series

[v3,1/8] xen/cpupool: support moving domain between cpupools with different granularity

Message ID 20201209160956.32456-2-jgross@suse.com (mailing list archive)
State Superseded
Headers show
Series xen: support per-cpupool scheduling granularity | expand

Commit Message

Jürgen Groß Dec. 9, 2020, 4:09 p.m. UTC
When moving a domain between cpupools with different scheduling
granularity the sched_units of the domain need to be adjusted.

Do that by allocating new sched_units and throwing away the old ones
in sched_move_domain().

Signed-off-by: Juergen Gross <jgross@suse.com>
---
 xen/common/sched/core.c | 121 ++++++++++++++++++++++++++++++----------
 1 file changed, 90 insertions(+), 31 deletions(-)

Comments

Dario Faggioli Dec. 16, 2020, 5:52 p.m. UTC | #1
On Wed, 2020-12-09 at 17:09 +0100, Juergen Gross wrote:
> When moving a domain between cpupools with different scheduling
> granularity the sched_units of the domain need to be adjusted.
> 
> Do that by allocating new sched_units and throwing away the old ones
> in sched_move_domain().
> 
> Signed-off-by: Juergen Gross <jgross@suse.com>
>
This looks fine, and can have:

Reviewed-by: Dario Faggioli <dfaggioli@suse.com>

I would only have one request. It's not a huge deal, and probably not
worth a resend only for that, but if either you or the committer are up
for complying with that in whatever way you find the most suitable,
that would be great.

I.e., can we...
> ---
>  xen/common/sched/core.c | 121 ++++++++++++++++++++++++++++++--------
> --
>  1 file changed, 90 insertions(+), 31 deletions(-)
> 
> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
> index a429fc7640..2a61c879b3 100644
> --- a/xen/common/sched/core.c
> +++ b/xen/common/sched/core.c
> 
> [...]
> -    old_ops = dom_scheduler(d);
>      old_domdata = d->sched_priv;
> 
Move *here* (i.e., above this new call to cpumask_first()) the comment
that is currently inside the loop?
>  
> +    new_p = cpumask_first(d->cpupool->cpu_valid);
>      for_each_sched_unit ( d, unit )
>      {
> +        spinlock_t *lock;
> +
> +        /*
> +         * Temporarily move all units to same processor to make
> locking
> +         * easier when moving the new units to the new processors.
> +         */
>
This one here, basically ^^^

> +        lock = unit_schedule_lock_irq(unit);
> +        sched_set_res(unit, get_sched_res(new_p));
> +        spin_unlock_irq(lock);
> +
>          sched_remove_unit(old_ops, unit);
>      }
>  
Thanks and Regards
Jan Beulich Dec. 17, 2020, 7:49 a.m. UTC | #2
On 16.12.2020 18:52, Dario Faggioli wrote:
> On Wed, 2020-12-09 at 17:09 +0100, Juergen Gross wrote:
>> When moving a domain between cpupools with different scheduling
>> granularity the sched_units of the domain need to be adjusted.
>>
>> Do that by allocating new sched_units and throwing away the old ones
>> in sched_move_domain().
>>
>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>
> This looks fine, and can have:
> 
> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
> 
> I would only have one request. It's not a huge deal, and probably not
> worth a resend only for that, but if either you or the committer are up
> for complying with that in whatever way you find the most suitable,
> that would be great.

I'd certainly be okay making this adjustment while committing, as
long as Jürgen agrees. With ...

> I.e., can we...
>> ---
>>  xen/common/sched/core.c | 121 ++++++++++++++++++++++++++++++--------
>> --
>>  1 file changed, 90 insertions(+), 31 deletions(-)
>>
>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>> index a429fc7640..2a61c879b3 100644
>> --- a/xen/common/sched/core.c
>> +++ b/xen/common/sched/core.c
>>
>> [...]
>> -    old_ops = dom_scheduler(d);
>>      old_domdata = d->sched_priv;
>>
> Move *here* (i.e., above this new call to cpumask_first()) the comment
> that is currently inside the loop?
>>  
>> +    new_p = cpumask_first(d->cpupool->cpu_valid);
>>      for_each_sched_unit ( d, unit )
>>      {
>> +        spinlock_t *lock;
>> +
>> +        /*
>> +         * Temporarily move all units to same processor to make
>> locking
>> +         * easier when moving the new units to the new processors.
>> +         */
>>
> This one here, basically ^^^

... this comment moved out of here, I'd be tempted to suggest to
make ...

>> +        lock = unit_schedule_lock_irq(unit);

... this the variable's initializer then at the same time.

Jan
Jürgen Groß Dec. 17, 2020, 7:54 a.m. UTC | #3
On 17.12.20 08:49, Jan Beulich wrote:
> On 16.12.2020 18:52, Dario Faggioli wrote:
>> On Wed, 2020-12-09 at 17:09 +0100, Juergen Gross wrote:
>>> When moving a domain between cpupools with different scheduling
>>> granularity the sched_units of the domain need to be adjusted.
>>>
>>> Do that by allocating new sched_units and throwing away the old ones
>>> in sched_move_domain().
>>>
>>> Signed-off-by: Juergen Gross <jgross@suse.com>
>>>
>> This looks fine, and can have:
>>
>> Reviewed-by: Dario Faggioli <dfaggioli@suse.com>
>>
>> I would only have one request. It's not a huge deal, and probably not
>> worth a resend only for that, but if either you or the committer are up
>> for complying with that in whatever way you find the most suitable,
>> that would be great.
> 
> I'd certainly be okay making this adjustment while committing, as
> long as Jürgen agrees. With ...
> 
>> I.e., can we...
>>> ---
>>>   xen/common/sched/core.c | 121 ++++++++++++++++++++++++++++++--------
>>> --
>>>   1 file changed, 90 insertions(+), 31 deletions(-)
>>>
>>> diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
>>> index a429fc7640..2a61c879b3 100644
>>> --- a/xen/common/sched/core.c
>>> +++ b/xen/common/sched/core.c
>>>
>>> [...]
>>> -    old_ops = dom_scheduler(d);
>>>       old_domdata = d->sched_priv;
>>>
>> Move *here* (i.e., above this new call to cpumask_first()) the comment
>> that is currently inside the loop?
>>>   
>>> +    new_p = cpumask_first(d->cpupool->cpu_valid);
>>>       for_each_sched_unit ( d, unit )
>>>       {
>>> +        spinlock_t *lock;
>>> +
>>> +        /*
>>> +         * Temporarily move all units to same processor to make
>>> locking
>>> +         * easier when moving the new units to the new processors.
>>> +         */
>>>
>> This one here, basically ^^^
> 
> ... this comment moved out of here, I'd be tempted to suggest to
> make ...
> 
>>> +        lock = unit_schedule_lock_irq(unit);
> 
> ... this the variable's initializer then at the same time.

Fine with me.


Juergen
diff mbox series

Patch

diff --git a/xen/common/sched/core.c b/xen/common/sched/core.c
index a429fc7640..2a61c879b3 100644
--- a/xen/common/sched/core.c
+++ b/xen/common/sched/core.c
@@ -613,17 +613,45 @@  static void sched_move_irqs(const struct sched_unit *unit)
         vcpu_move_irqs(v);
 }
 
+/*
+ * Move a domain from one cpupool to another.
+ *
+ * A domain with any vcpu having temporary affinity settings will be denied
+ * to move. Hard and soft affinities will be reset.
+ *
+ * In order to support cpupools with different scheduling granularities all
+ * scheduling units are replaced by new ones.
+ *
+ * The complete move is done in the following steps:
+ * - check prerequisites (no vcpu with temporary affinities)
+ * - allocate all new data structures (scheduler specific domain data, unit
+ *   memory, scheduler specific unit data)
+ * - pause domain
+ * - temporarily move all (old) units to the same scheduling resource (this
+ *   makes the final resource assignment easier in case the new cpupool has
+ *   a larger granularity than the old one, as the scheduling locks for all
+ *   vcpus must be held for that operation)
+ * - remove old units from scheduling
+ * - set new cpupool and scheduler domain data pointers in struct domain
+ * - switch all vcpus to new units, still assigned to the old scheduling
+ *   resource
+ * - migrate all new units to scheduling resources of the new cpupool
+ * - unpause the domain
+ * - free the old memory (scheduler specific domain data, unit memory,
+ *   scheduler specific unit data)
+ */
 int sched_move_domain(struct domain *d, struct cpupool *c)
 {
     struct vcpu *v;
-    struct sched_unit *unit;
+    struct sched_unit *unit, *old_unit;
+    struct sched_unit *new_units = NULL, *old_units;
+    struct sched_unit **unit_ptr = &new_units;
     unsigned int new_p, unit_idx;
-    void **unit_priv;
     void *domdata;
-    void *unitdata;
-    struct scheduler *old_ops;
+    struct scheduler *old_ops = dom_scheduler(d);
     void *old_domdata;
     unsigned int gran = cpupool_get_granularity(c);
+    unsigned int n_units = DIV_ROUND_UP(d->max_vcpus, gran);
     int ret = 0;
 
     for_each_vcpu ( d, v )
@@ -641,53 +669,78 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
         goto out;
     }
 
-    unit_priv = xzalloc_array(void *, DIV_ROUND_UP(d->max_vcpus, gran));
-    if ( unit_priv == NULL )
+    for ( unit_idx = 0; unit_idx < n_units; unit_idx++ )
     {
-        sched_free_domdata(c->sched, domdata);
-        ret = -ENOMEM;
-        goto out;
-    }
+        unit = sched_alloc_unit_mem();
+        if ( unit )
+        {
+            /* Initialize unit for sched_alloc_udata() to work. */
+            unit->domain = d;
+            unit->unit_id = unit_idx * gran;
+            unit->vcpu_list = d->vcpu[unit->unit_id];
+            unit->priv = sched_alloc_udata(c->sched, unit, domdata);
+            *unit_ptr = unit;
+        }
 
-    unit_idx = 0;
-    for_each_sched_unit ( d, unit )
-    {
-        unit_priv[unit_idx] = sched_alloc_udata(c->sched, unit, domdata);
-        if ( unit_priv[unit_idx] == NULL )
+        if ( !unit || !unit->priv )
         {
-            for ( unit_idx = 0; unit_priv[unit_idx]; unit_idx++ )
-                sched_free_udata(c->sched, unit_priv[unit_idx]);
-            xfree(unit_priv);
-            sched_free_domdata(c->sched, domdata);
+            old_units = new_units;
+            old_domdata = domdata;
             ret = -ENOMEM;
-            goto out;
+            goto out_free;
         }
-        unit_idx++;
+
+        unit_ptr = &unit->next_in_list;
     }
 
     domain_pause(d);
 
-    old_ops = dom_scheduler(d);
     old_domdata = d->sched_priv;
 
+    new_p = cpumask_first(d->cpupool->cpu_valid);
     for_each_sched_unit ( d, unit )
     {
+        spinlock_t *lock;
+
+        /*
+         * Temporarily move all units to same processor to make locking
+         * easier when moving the new units to the new processors.
+         */
+        lock = unit_schedule_lock_irq(unit);
+        sched_set_res(unit, get_sched_res(new_p));
+        spin_unlock_irq(lock);
+
         sched_remove_unit(old_ops, unit);
     }
 
+    old_units = d->sched_unit_list;
+
     d->cpupool = c;
     d->sched_priv = domdata;
 
+    unit = new_units;
+    for_each_vcpu ( d, v )
+    {
+        old_unit = v->sched_unit;
+        if ( unit->unit_id + gran == v->vcpu_id )
+            unit = unit->next_in_list;
+
+        unit->state_entry_time = old_unit->state_entry_time;
+        unit->runstate_cnt[v->runstate.state]++;
+        /* Temporarily use old resource assignment */
+        unit->res = get_sched_res(new_p);
+
+        v->sched_unit = unit;
+    }
+
+    d->sched_unit_list = new_units;
+
     new_p = cpumask_first(c->cpu_valid);
-    unit_idx = 0;
     for_each_sched_unit ( d, unit )
     {
         spinlock_t *lock;
         unsigned int unit_p = new_p;
 
-        unitdata = unit->priv;
-        unit->priv = unit_priv[unit_idx];
-
         for_each_sched_unit_vcpu ( unit, v )
         {
             migrate_timer(&v->periodic_timer, new_p);
@@ -713,8 +766,6 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
 
         sched_insert_unit(c->sched, unit);
 
-        sched_free_udata(old_ops, unitdata);
-
         unit_idx++;
     }
 
@@ -722,11 +773,19 @@  int sched_move_domain(struct domain *d, struct cpupool *c)
 
     domain_unpause(d);
 
-    sched_free_domdata(old_ops, old_domdata);
+ out_free:
+    for ( unit = old_units; unit; )
+    {
+        if ( unit->priv )
+            sched_free_udata(c->sched, unit->priv);
+        old_unit = unit;
+        unit = unit->next_in_list;
+        xfree(old_unit);
+    }
 
-    xfree(unit_priv);
+    sched_free_domdata(old_ops, old_domdata);
 
-out:
+ out:
     rcu_read_unlock(&sched_res_rculock);
 
     return ret;