diff mbox

[2/4] xen: credit2: allow to set and get utilization cap

Message ID 149692373432.9605.11077306237899753280.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli June 8, 2017, 12:08 p.m. UTC
As cap is already present in Credit1, as a parameter, all
the wiring is there already for it to be percolate down
to csched2_dom_cntl() too.

In this commit, we actually deal with it, and implement
setting, changing or disabling the cap of a domain.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
---
Cc: George Dunlap <george.dunlap@eu.citrix.com>
Cc: Anshul Makkar <anshul.makkar@citrix.com>
---
 xen/common/sched_credit2.c  |  119 +++++++++++++++++++++++++++++++++++++++++--
 xen/include/public/domctl.h |    1 
 2 files changed, 115 insertions(+), 5 deletions(-)

Comments

George Dunlap June 28, 2017, 3:19 p.m. UTC | #1
On Thu, Jun 8, 2017 at 1:08 PM, Dario Faggioli
<dario.faggioli@citrix.com> wrote:
> As cap is already present in Credit1, as a parameter, all
> the wiring is there already for it to be percolate down
> to csched2_dom_cntl() too.
>
> In this commit, we actually deal with it, and implement
> setting, changing or disabling the cap of a domain.
>
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>

BTW +1 the decision to put this in a separate patch.  I think it made
review easier.

> ---
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> ---
>  xen/common/sched_credit2.c  |  119 +++++++++++++++++++++++++++++++++++++++++--
>  xen/include/public/domctl.h |    1
>  2 files changed, 115 insertions(+), 5 deletions(-)
>
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index ba4bf4b..3f7b8f0 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -2498,30 +2498,35 @@ csched2_dom_cntl(
>      struct csched2_dom * const sdom = csched2_dom(d);
>      struct csched2_private *prv = csched2_priv(ops);
>      unsigned long flags;
> +    struct vcpu *v;
>      int rc = 0;
>
>      /*
>       * Locking:
>       *  - we must take the private lock for accessing the weights of the
> -     *    vcpus of d,
> +     *    vcpus of d, and/or the cap;
>       *  - in the putinfo case, we also need the runqueue lock(s), for
>       *    updating the max waight of the runqueue(s).
> +     *    If changing the cap, we also need the budget_lock, for updating
> +     *    the value of the domain budget pool (and the runqueue lock,
> +     *    for adjusting the parameters and rescheduling any vCPU that is
> +     *    running at the time of the change).
>       */
>      switch ( op->cmd )
>      {
>      case XEN_DOMCTL_SCHEDOP_getinfo:
>          read_lock_irqsave(&prv->lock, flags);
>          op->u.credit2.weight = sdom->weight;
> +        op->u.credit2.cap = sdom->cap;
>          read_unlock_irqrestore(&prv->lock, flags);
>          break;
>      case XEN_DOMCTL_SCHEDOP_putinfo:
> +        write_lock_irqsave(&prv->lock, flags);
> +        /* Weight */
>          if ( op->u.credit2.weight != 0 )
>          {
> -            struct vcpu *v;
>              int old_weight;
>
> -            write_lock_irqsave(&prv->lock, flags);
> -
>              old_weight = sdom->weight;
>
>              sdom->weight = op->u.credit2.weight;
> @@ -2539,9 +2544,113 @@ csched2_dom_cntl(
>
>                  vcpu_schedule_unlock(lock, svc->vcpu);
>              }
> +        }
> +        /* Cap */
> +        if ( op->u.credit2.cap != 0 )
> +        {
> +            spin_lock(&sdom->budget_lock);
> +            sdom->tot_budget = (CSCHED2_BDGT_REPL_PERIOD / 100) * op->u.credit2.cap;

When doing integer arithmetic like this, I think it's usually better
to do the multiply first -- unless you're afraid of overflow, which
shouldn't (in theory) be an issue here.

Speaking of which -- we probably want to make sure 'cap' is <= 100 * nvcpus. :-)

> +            spin_unlock(&sdom->budget_lock);
> +
> +            if ( sdom->cap == 0 )
> +            {
> +                /*
> +                 * Let's give to the domain the budget it is entitled of,

"entitled to"

Although if you want to be strictly correct (i.e., following the
official rules rather than the way people actually speak) you should
say "to which it is entitled" (not supposed to have a dangling
preposition).  I'll leave it up to you. :-)

> +                 * and queue its first replenishment event.
> +                 *
> +                 * Since cap is currently disabled for this domain, we
> +                 * know no vCPU is messing with the domain's budget, and
> +                 * the replenishment timer is still off.
> +                 * For these reasons, it is safe to do the following without
> +                 * taking the budget_lock.
> +                 */
> +                sdom->budget = sdom->tot_budget;
> +                sdom->next_repl = NOW() + CSCHED2_BDGT_REPL_PERIOD;
> +                set_timer(&sdom->repl_timer, sdom->next_repl);
> +
> +                /*
> +                 * Now, let's enable budget accounting for all the vCPUs.
> +                 * For making sure that they will start to honour the domain's
> +                 * cap, we set their budget to 0.
> +                 * This way, as soon as they will try to run, they will have
> +                 * to get some budget.
> +                 *
> +                 * For the vCPUs that are already running, we trigger the
> +                 * scheduler on their pCPU. When, as a consequence of this,
> +                 * csched2_schedule() will run, it will figure out there is
> +                 * no budget, and the vCPU will try to get some (and be parked,
> +                 * if there's none, and we'll switch to someone else).
> +                 */
> +                for_each_vcpu ( d, v )
> +                {
> +                    struct csched2_vcpu *svc = csched2_vcpu(v);
> +                    spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
> +
> +                    if ( v->is_running )
> +                    {
> +                        unsigned int cpu = v->processor;
> +                        struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
> +
> +                        ASSERT(curr_on_cpu(cpu) == v);
> +
> +                        /*
> +                         * We are triggering a reschedule on the vCPU's
> +                         * pCPU. That will run burn_credits() and, since
> +                         * the vCPU is capped now, it would charge all the
> +                         * execution time of this last round as budget as
> +                         * well. That will make the vCPU budget go negative,
> +                         * potentially by a large amount, and it's unfair.
> +                         *
> +                         * To avoid that, call burn_credit() here, to do the
> +                         * accounting of this current running instance now,
> +                         * with budgetting still disabled. This does not
> +                         * prevent some small amount of budget being charged
> +                         * to the vCPU (i.e., the amount of time it runs from
> +                         * now, to when scheduling happens). The budget will
> +                         * also go below 0, but a lot less than how it would
> +                         * if we don't do this.
> +                         */
> +                        burn_credits(rqd, svc, NOW());
> +                        __cpumask_set_cpu(cpu, &rqd->tickled);
> +                        ASSERT(!cpumask_test_cpu(cpu, &rqd->smt_idle));
> +                        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
> +                    }
> +                    svc->budget = 0;
> +                    vcpu_schedule_unlock(lock, svc->vcpu);
> +                }
> +            }
> +            sdom->cap = op->u.credit2.cap;
> +        }
> +        else if ( sdom->cap != 0 )
> +        {
> +            stop_timer(&sdom->repl_timer);
> +
> +            /* Disable budget accounting for all the vCPUs. */
> +            for_each_vcpu ( d, v )
> +            {
> +                struct csched2_vcpu *svc = csched2_vcpu(v);
> +                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
> +
> +                svc->budget = STIME_MAX;
>
> -            write_unlock_irqrestore(&prv->lock, flags);
> +                vcpu_schedule_unlock(lock, svc->vcpu);
> +            }
> +            sdom->cap = 0;
> +            /*
> +             * We are disabling the cap for this domain, which may have vCPUs
> +             * waiting for a replenishment, and we need to unpark them all.
> +             * Parked vcpus sit in the domain's parked_vcpus list, which would
> +             * require being manipulated with the budget_lock held. However,
> +             * we have already disabled budget accounting for all the vCPUs of
> +             * this domain in the loop above, and therefore, no vCPU will run
> +             * out of budget and need being added to the list.
> +             *
> +             * For this reason, it is safe, in this case, to just go ahead and
> +             * drain the list, without the need of taking the budget_lock.
> +             */
> +            unpark_parked_vcpus(ops, &sdom->parked_vcpus);

I think it is safe currently.  But is there any reason not to just
grab the lock anyway?  We don't expect cap adjustment actions to be
that common, and it would mean less chance of error in the future.

I'm not 100% set on grabbing the budget lock, but I do think it's a better idea.

Other than that looks good, thanks!

 -George
Alan Robinson June 29, 2017, 7:39 a.m. UTC | #2
On Thu, Jun 08, 2017 at 02:08:54PM +0200, Dario Faggioli wrote:
> As cap is already present in Credit1, as a parameter, all
> the wiring is there already for it to be percolate down

s/be //

> to csched2_dom_cntl() too.

Alan
George Dunlap June 29, 2017, 8:26 a.m. UTC | #3
On 29/06/17 08:39, Alan Robinson wrote:
> On Thu, Jun 08, 2017 at 02:08:54PM +0200, Dario Faggioli wrote:
>> As cap is already present in Credit1, as a parameter, all
>> the wiring is there already for it to be percolate down
> 
> s/be //

or "to be percolated down"

 -G
Dario Faggioli June 29, 2017, 10:21 a.m. UTC | #4
On Wed, 2017-06-28 at 16:19 +0100, George Dunlap wrote:
> On Thu, Jun 8, 2017 at 1:08 PM, Dario Faggioli
> <dario.faggioli@citrix.com> wrote:
> > 
> > +            /*
> > +             * We are disabling the cap for this domain, which may
> > have vCPUs
> > +             * waiting for a replenishment, and we need to unpark
> > them all.
> > +             * Parked vcpus sit in the domain's parked_vcpus list,
> > which would
> > +             * require being manipulated with the budget_lock
> > held. However,
> > +             * we have already disabled budget accounting for all
> > the vCPUs of
> > +             * this domain in the loop above, and therefore, no
> > vCPU will run
> > +             * out of budget and need being added to the list.
> > +             *
> > +             * For this reason, it is safe, in this case, to just
> > go ahead and
> > +             * drain the list, without the need of taking the
> > budget_lock.
> > +             */
> > +            unpark_parked_vcpus(ops, &sdom->parked_vcpus);
> 
> I think it is safe currently.  But is there any reason not to just
> grab the lock anyway?  We don't expect cap adjustment actions to be
> that common, and it would mean less chance of error in the future.
> 
The code for doing that properly is a little more involved. That's
because unpark_parked_vcpus() takes runqueue locks, and hence can't be
called with the budget lock held anyway.

We'd therefore need to do the "trick" of the temporary list, as we do
already in repl_sdom_budget() and csched2_context_saved().

That's why I did not do that here... but, after all, if I actually do
things that way, I can kill the comment, so probably the number of
changed lines for this patch would be the same (if not smaller! :-D).

I'll think about it (with a bias toward taking the lock, as you
suggest).

Thanks and Regards,
Dario
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index ba4bf4b..3f7b8f0 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -2498,30 +2498,35 @@  csched2_dom_cntl(
     struct csched2_dom * const sdom = csched2_dom(d);
     struct csched2_private *prv = csched2_priv(ops);
     unsigned long flags;
+    struct vcpu *v;
     int rc = 0;
 
     /*
      * Locking:
      *  - we must take the private lock for accessing the weights of the
-     *    vcpus of d,
+     *    vcpus of d, and/or the cap;
      *  - in the putinfo case, we also need the runqueue lock(s), for
      *    updating the max waight of the runqueue(s).
+     *    If changing the cap, we also need the budget_lock, for updating
+     *    the value of the domain budget pool (and the runqueue lock,
+     *    for adjusting the parameters and rescheduling any vCPU that is
+     *    running at the time of the change).
      */
     switch ( op->cmd )
     {
     case XEN_DOMCTL_SCHEDOP_getinfo:
         read_lock_irqsave(&prv->lock, flags);
         op->u.credit2.weight = sdom->weight;
+        op->u.credit2.cap = sdom->cap;
         read_unlock_irqrestore(&prv->lock, flags);
         break;
     case XEN_DOMCTL_SCHEDOP_putinfo:
+        write_lock_irqsave(&prv->lock, flags);
+        /* Weight */
         if ( op->u.credit2.weight != 0 )
         {
-            struct vcpu *v;
             int old_weight;
 
-            write_lock_irqsave(&prv->lock, flags);
-
             old_weight = sdom->weight;
 
             sdom->weight = op->u.credit2.weight;
@@ -2539,9 +2544,113 @@  csched2_dom_cntl(
 
                 vcpu_schedule_unlock(lock, svc->vcpu);
             }
+        }
+        /* Cap */
+        if ( op->u.credit2.cap != 0 )
+        {
+            spin_lock(&sdom->budget_lock);
+            sdom->tot_budget = (CSCHED2_BDGT_REPL_PERIOD / 100) * op->u.credit2.cap;
+            spin_unlock(&sdom->budget_lock);
+
+            if ( sdom->cap == 0 )
+            {
+                /*
+                 * Let's give to the domain the budget it is entitled of,
+                 * and queue its first replenishment event.
+                 *
+                 * Since cap is currently disabled for this domain, we
+                 * know no vCPU is messing with the domain's budget, and
+                 * the replenishment timer is still off.
+                 * For these reasons, it is safe to do the following without
+                 * taking the budget_lock.
+                 */
+                sdom->budget = sdom->tot_budget;
+                sdom->next_repl = NOW() + CSCHED2_BDGT_REPL_PERIOD;
+                set_timer(&sdom->repl_timer, sdom->next_repl);
+
+                /*
+                 * Now, let's enable budget accounting for all the vCPUs.
+                 * For making sure that they will start to honour the domain's
+                 * cap, we set their budget to 0.
+                 * This way, as soon as they will try to run, they will have
+                 * to get some budget.
+                 *
+                 * For the vCPUs that are already running, we trigger the
+                 * scheduler on their pCPU. When, as a consequence of this,
+                 * csched2_schedule() will run, it will figure out there is
+                 * no budget, and the vCPU will try to get some (and be parked,
+                 * if there's none, and we'll switch to someone else).
+                 */
+                for_each_vcpu ( d, v )
+                {
+                    struct csched2_vcpu *svc = csched2_vcpu(v);
+                    spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+
+                    if ( v->is_running )
+                    {
+                        unsigned int cpu = v->processor;
+                        struct csched2_runqueue_data *rqd = c2rqd(ops, cpu);
+
+                        ASSERT(curr_on_cpu(cpu) == v);
+
+                        /*
+                         * We are triggering a reschedule on the vCPU's
+                         * pCPU. That will run burn_credits() and, since
+                         * the vCPU is capped now, it would charge all the
+                         * execution time of this last round as budget as
+                         * well. That will make the vCPU budget go negative,
+                         * potentially by a large amount, and it's unfair.
+                         *
+                         * To avoid that, call burn_credit() here, to do the
+                         * accounting of this current running instance now,
+                         * with budgetting still disabled. This does not
+                         * prevent some small amount of budget being charged
+                         * to the vCPU (i.e., the amount of time it runs from
+                         * now, to when scheduling happens). The budget will
+                         * also go below 0, but a lot less than how it would
+                         * if we don't do this.
+                         */
+                        burn_credits(rqd, svc, NOW());
+                        __cpumask_set_cpu(cpu, &rqd->tickled);
+                        ASSERT(!cpumask_test_cpu(cpu, &rqd->smt_idle));
+                        cpu_raise_softirq(cpu, SCHEDULE_SOFTIRQ);
+                    }
+                    svc->budget = 0;
+                    vcpu_schedule_unlock(lock, svc->vcpu);
+                }
+            }
+            sdom->cap = op->u.credit2.cap;
+        }
+        else if ( sdom->cap != 0 )
+        {
+            stop_timer(&sdom->repl_timer);
+
+            /* Disable budget accounting for all the vCPUs. */
+            for_each_vcpu ( d, v )
+            {
+                struct csched2_vcpu *svc = csched2_vcpu(v);
+                spinlock_t *lock = vcpu_schedule_lock(svc->vcpu);
+
+                svc->budget = STIME_MAX;
 
-            write_unlock_irqrestore(&prv->lock, flags);
+                vcpu_schedule_unlock(lock, svc->vcpu);
+            }
+            sdom->cap = 0;
+            /*
+             * We are disabling the cap for this domain, which may have vCPUs
+             * waiting for a replenishment, and we need to unpark them all.
+             * Parked vcpus sit in the domain's parked_vcpus list, which would
+             * require being manipulated with the budget_lock held. However,
+             * we have already disabled budget accounting for all the vCPUs of
+             * this domain in the loop above, and therefore, no vCPU will run
+             * out of budget and need being added to the list.
+             *
+             * For this reason, it is safe, in this case, to just go ahead and
+             * drain the list, without the need of taking the budget_lock.
+             */
+            unpark_parked_vcpus(ops, &sdom->parked_vcpus);
         }
+        write_unlock_irqrestore(&prv->lock, flags);
         break;
     default:
         rc = -EINVAL;
diff --git a/xen/include/public/domctl.h b/xen/include/public/domctl.h
index 515c603..795972e 100644
--- a/xen/include/public/domctl.h
+++ b/xen/include/public/domctl.h
@@ -355,6 +355,7 @@  typedef struct xen_domctl_sched_credit {
 
 typedef struct xen_domctl_sched_credit2 {
     uint16_t weight;
+    uint16_t cap;
 } xen_domctl_sched_credit2_t;
 
 typedef struct xen_domctl_sched_rtds {