diff mbox

[3/7] xen: credit2: soft-affinity awareness in fallback_cpu()

Message ID 149762243723.11899.13163340131516329714.stgit@Solace.fritz.box (mailing list archive)
State New, archived
Headers show

Commit Message

Dario Faggioli June 16, 2017, 2:13 p.m. UTC
By, basically, moving all the logic of the function
inside the usual two steps (soft-affinity step and
hard-affinity step) loop.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
---
Cc: Anshul Makkar <anshul.makkar@citrix.com>
Cc: George Dunlap <george.dunlap@eu.citrix.com>
---
George, you gave your Reviewed-by to:
 https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html

which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
 https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html

I changed the cpu_pick() part a lot, and that's why I decided to split the
patch in two.  As far as fallback_cpu(), though, what's done in this patch is
exactly the same that was being done in the original one.

So, of course I'm dropping the Rev-by, but I thought it could have been useful
to mention this. :-)
---
 xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
 1 file changed, 56 insertions(+), 21 deletions(-)

Comments

George Dunlap July 25, 2017, 10:19 a.m. UTC | #1
On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> By, basically, moving all the logic of the function
> inside the usual two steps (soft-affinity step and
> hard-affinity step) loop.
> 
> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
> ---
> Cc: Anshul Makkar <anshul.makkar@citrix.com>
> Cc: George Dunlap <george.dunlap@eu.citrix.com>
> ---
> George, you gave your Reviewed-by to:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html
> 
> which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
>  https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html
> 
> I changed the cpu_pick() part a lot, and that's why I decided to split the
> patch in two.  As far as fallback_cpu(), though, what's done in this patch is
> exactly the same that was being done in the original one.
> 
> So, of course I'm dropping the Rev-by, but I thought it could have been useful
> to mention this. :-)
> ---
>  xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
>  1 file changed, 56 insertions(+), 21 deletions(-)
> 
> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
> index c749d4e..54f6e21 100644
> --- a/xen/common/sched_credit2.c
> +++ b/xen/common/sched_credit2.c
> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>  }
>  
>  /*
> - * When a hard affinity change occurs, we may not be able to check some
> - * (any!) of the other runqueues, when looking for the best new processor
> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
> - * pick, in order of decreasing preference:
> - *  - svc's current pcpu;
> - *  - another pcpu from svc's current runq;
> - *  - any cpu.
> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
> + * we pick, in order of decreasing preference:
> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
> + *  3) just one valid pcpu from svc's soft affinity;
> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
> + *  6) just one valid pcpu from svc's hard affinity
> + *
> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>   */
>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>  {
>      struct vcpu *v = svc->vcpu;
> -    int cpu = v->processor;
> +    unsigned int bs;
>  
> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> -                cpupool_domain_cpumask(v->domain));
> +    for_each_affinity_balance_step( bs )
> +    {
> +        int cpu = v->processor;
> +
> +        if ( bs == BALANCE_SOFT_AFFINITY &&
> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
> +            continue;
>  
> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> -        return cpu;
> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                    cpupool_domain_cpumask(v->domain));
>  
> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> -                                   &svc->rqd->active)) )
> -    {
> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> -                    cpumask_scratch_cpu(cpu));
> -        return cpumask_first(cpumask_scratch_cpu(cpu));
> -    }
> +        /*
> +         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
> +         * in our affinity, go for it, for cache betterness.
> +         */
> +        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> +            return cpu;
>  
> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> +        /*
> +         * This is cases 2 or 5 (depending on bs): v->processor isn't there
> +         * any longer, check if we at least can stay in our current runq.
> +         */
> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> +                                       &svc->rqd->active)) )
> +        {
> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
> +                        &svc->rqd->active);
> +            return cpumask_first(cpumask_scratch_cpu(cpu));
> +        }
>  
> -    return cpumask_first(cpumask_scratch_cpu(cpu));
> +        /*
> +         * This is cases 3 or 6 (depending on bs): last stand, just one valid
> +         * pcpu from our soft affinity, if we have one and if there's any. In
> +         * fact, if we are doing soft-affinity, it is possible that we fail,
> +         * which means we stay in the loop and look for hard affinity. OTOH,
> +         * if we are at the hard-affinity balancing step, it's guaranteed that
> +         * there is at least one valid cpu, and therefore we are sure that we
> +         * return it, and never really exit the loop.
> +         */
> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
> +               bs == BALANCE_SOFT_AFFINITY);
> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));

So just checking my understanding here... at this point we're not taking
into consideration load or idleness or anything else -- we're just
saying, "Is there a cpu in my soft affinity it is *possible* to run on?"
 So on a properly configured system, we should never take the second
iteration of the loop?

> +        if ( likely(cpu < nr_cpu_ids) )
> +            return cpu;
> +    }
> +    BUG_ON(1);

Do we want to BUG() here?  I don't think this constitutes an
unrecoverable error; an ASSERT_UNREACHABLE() plus something random would
be better, wouldn't it?

 -George
George Dunlap July 25, 2017, 10:20 a.m. UTC | #2
On 07/25/2017 11:19 AM, George Dunlap wrote:
> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>> By, basically, moving all the logic of the function
>> inside the usual two steps (soft-affinity step and
>> hard-affinity step) loop.
>>
>> Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
>> Signed-off-by: Justin T. Weaver <jtweaver@hawaii.edu>
>> ---
>> Cc: Anshul Makkar <anshul.makkar@citrix.com>
>> Cc: George Dunlap <george.dunlap@eu.citrix.com>
>> ---
>> George, you gave your Reviewed-by to:
>>  https://lists.xenproject.org/archives/html/xen-devel/2016-08/msg02201.html
>>
>> which was adding soft-affinity awareness to both fallback_cpu and cpu_pick(). See here:
>>  https://lists.xenproject.org/archives/html/xen-devel/2016-09/msg03259.html
>>
>> I changed the cpu_pick() part a lot, and that's why I decided to split the
>> patch in two.  As far as fallback_cpu(), though, what's done in this patch is
>> exactly the same that was being done in the original one.
>>
>> So, of course I'm dropping the Rev-by, but I thought it could have been useful
>> to mention this. :-)
>> ---
>>  xen/common/sched_credit2.c |   77 ++++++++++++++++++++++++++++++++------------
>>  1 file changed, 56 insertions(+), 21 deletions(-)
>>
>> diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
>> index c749d4e..54f6e21 100644
>> --- a/xen/common/sched_credit2.c
>> +++ b/xen/common/sched_credit2.c
>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
>>  }
>>  
>>  /*
>> - * When a hard affinity change occurs, we may not be able to check some
>> - * (any!) of the other runqueues, when looking for the best new processor
>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
>> - * pick, in order of decreasing preference:
>> - *  - svc's current pcpu;
>> - *  - another pcpu from svc's current runq;
>> - *  - any cpu.
>> + * In csched2_cpu_pick(), it may not be possible to actually look at remote
>> + * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
>> + * we pick, in order of decreasing preference:
>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
>> + *  3) just one valid pcpu from svc's soft affinity;
>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
>> + *  6) just one valid pcpu from svc's hard affinity
>> + *
>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
>> + * note that at least 6 is guaranteed to _always_ return at least one pcpu.
>>   */
>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>  {
>>      struct vcpu *v = svc->vcpu;
>> -    int cpu = v->processor;
>> +    unsigned int bs;
>>  
>> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>> -                cpupool_domain_cpumask(v->domain));
>> +    for_each_affinity_balance_step( bs )
>> +    {
>> +        int cpu = v->processor;
>> +
>> +        if ( bs == BALANCE_SOFT_AFFINITY &&
>> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
>> +            continue;
>>  
>> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>> -        return cpu;
>> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>> +        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>> +                    cpupool_domain_cpumask(v->domain));
>>  
>> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>> -                                   &svc->rqd->active)) )
>> -    {
>> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
>> -                    cpumask_scratch_cpu(cpu));
>> -        return cpumask_first(cpumask_scratch_cpu(cpu));
>> -    }
>> +        /*
>> +         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
>> +         * in our affinity, go for it, for cache betterness.
>> +         */
>> +        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>> +            return cpu;
>>  
>> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>> +        /*
>> +         * This is cases 2 or 5 (depending on bs): v->processor isn't there
>> +         * any longer, check if we at least can stay in our current runq.
>> +         */
>> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>> +                                       &svc->rqd->active)) )
>> +        {
>> +            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
>> +                        &svc->rqd->active);
>> +            return cpumask_first(cpumask_scratch_cpu(cpu));
>> +        }
>>  
>> -    return cpumask_first(cpumask_scratch_cpu(cpu));
>> +        /*
>> +         * This is cases 3 or 6 (depending on bs): last stand, just one valid
>> +         * pcpu from our soft affinity, if we have one and if there's any. In
>> +         * fact, if we are doing soft-affinity, it is possible that we fail,
>> +         * which means we stay in the loop and look for hard affinity. OTOH,
>> +         * if we are at the hard-affinity balancing step, it's guaranteed that
>> +         * there is at least one valid cpu, and therefore we are sure that we
>> +         * return it, and never really exit the loop.
>> +         */
>> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>> +               bs == BALANCE_SOFT_AFFINITY);
>> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> 
> So just checking my understanding here... at this point we're not taking
> into consideration load or idleness or anything else -- we're just
> saying, "Is there a cpu in my soft affinity it is *possible* to run on?"
>  So on a properly configured system, we should never take the second
> iteration of the loop?
> 
>> +        if ( likely(cpu < nr_cpu_ids) )
>> +            return cpu;
>> +    }
>> +    BUG_ON(1);
> 
> Do we want to BUG() here?  I don't think this constitutes an
> unrecoverable error; an ASSERT_UNREACHABLE() plus something random would
> be better, wouldn't it?

Oh, should have said, everything else looks good; but apparently I said
that before. :-)

 -George
Dario Faggioli July 25, 2017, 4 p.m. UTC | #3
On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
> > 
> > diff --git a/xen/common/sched_credit2.c
> > b/xen/common/sched_credit2.c
> > index c749d4e..54f6e21 100644
> > --- a/xen/common/sched_credit2.c
> > +++ b/xen/common/sched_credit2.c
> > @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
> > cpumask_t *mask)
> >  }
> >  
> >  /*
> > - * When a hard affinity change occurs, we may not be able to check
> > some
> > - * (any!) of the other runqueues, when looking for the best new
> > processor
> > - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
> > happens, we
> > - * pick, in order of decreasing preference:
> > - *  - svc's current pcpu;
> > - *  - another pcpu from svc's current runq;
> > - *  - any cpu.
> > + * In csched2_cpu_pick(), it may not be possible to actually look
> > at remote
> > + * runqueues (the trylock-s on their spinlocks can fail!). If that
> > happens,
> > + * we pick, in order of decreasing preference:
> > + *  1) svc's current pcpu, if it is part of svc's soft affinity;
> > + *  2) a pcpu in svc's current runqueue that is also in svc's soft
> > affinity;
> > + *  3) just one valid pcpu from svc's soft affinity;
> > + *  4) svc's current pcpu, if it is part of svc's hard affinity;
> > + *  5) a pcpu in svc's current runqueue that is also in svc's hard
> > affinity;
> > + *  6) just one valid pcpu from svc's hard affinity
> > + *
> > + * Of course, 1, 2 and 3 makes sense only if svc has a soft
> > affinity. Also
> > + * note that at least 6 is guaranteed to _always_ return at least
> > one pcpu.
> >   */
> >  static int get_fallback_cpu(struct csched2_vcpu *svc)
> >  {
> >      struct vcpu *v = svc->vcpu;
> > -    int cpu = v->processor;
> > +    unsigned int bs;
> >  
> > -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
> > -                cpupool_domain_cpumask(v->domain));
> > +    for_each_affinity_balance_step( bs )
> > +    {
> > +        int cpu = v->processor;
> > +
> > +        if ( bs == BALANCE_SOFT_AFFINITY &&
> > +             !has_soft_affinity(v, v->cpu_hard_affinity) )
> > +            continue;
> >  
> > -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
> > -        return cpu;
> > +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
> > +        cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +                    cpupool_domain_cpumask(v->domain));
> >  
> > -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > -                                   &svc->rqd->active)) )
> > -    {
> > -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
> > -                    cpumask_scratch_cpu(cpu));
> > -        return cpumask_first(cpumask_scratch_cpu(cpu));
> > -    }
> > +        /*
> > +         * This is cases 1 or 4 (depending on bs): if v->processor 
> > is (still)
> > +         * in our affinity, go for it, for cache betterness.
> > +         */
> > +        if ( likely(cpumask_test_cpu(cpu,
> > cpumask_scratch_cpu(cpu))) )
> > +            return cpu;
> >  
> > -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
> > +        /*
> > +         * This is cases 2 or 5 (depending on bs): v->processor
> > isn't there
> > +         * any longer, check if we at least can stay in our
> > current runq.
> > +         */
> > +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
> > +                                       &svc->rqd->active)) )
> > +        {
> > +            cpumask_and(cpumask_scratch_cpu(cpu),
> > cpumask_scratch_cpu(cpu),
> > +                        &svc->rqd->active);
> > +            return cpumask_first(cpumask_scratch_cpu(cpu));
> > +        }
> >  
> > -    return cpumask_first(cpumask_scratch_cpu(cpu));
> > +        /*
> > +         * This is cases 3 or 6 (depending on bs): last stand,
> > just one valid
> > +         * pcpu from our soft affinity, if we have one and if
> > there's any. In
> > +         * fact, if we are doing soft-affinity, it is possible
> > that we fail,
> > +         * which means we stay in the loop and look for hard
> > affinity. OTOH,
> > +         * if we are at the hard-affinity balancing step, it's
> > guaranteed that
> > +         * there is at least one valid cpu, and therefore we are
> > sure that we
> > +         * return it, and never really exit the loop.
> > +         */
> > +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
> > +               bs == BALANCE_SOFT_AFFINITY);
> > +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
> 
> So just checking my understanding here... at this point we're not
> taking
> into consideration load or idleness or anything else -- we're just
> saying, "Is there a cpu in my soft affinity it is *possible* to run
> on?"
>
Exactly. If we are in this function, it means we failed to take the
locks we needed, for making a choice based on load, idleness, etc, but
we need a CPU, so we pick whatever is valid.

For choosing among all the valid ones, we act how it is explained in
the comment.

>  So on a properly configured system, we should never take the second
> iteration of the loop?
> 
Mmm.. I think you're right. In fact, in a properly configured system,
we'll never go past step 3 (from the comment at the top).

Which is not ideal, or at least not what I had in mind. In fact, I
think it's better to check step 4 (svc->vcpu->processor in hard-
affinity) and step 5 (a CPU from svc's runqueue in hard affinity), as
that would mean avoiding a runqueue migration.

What about I basically kill step 3, i.e., if we reach this point during
the soft-affinity step, I just continue to the hard-affinity one?

> > +        if ( likely(cpu < nr_cpu_ids) )
> > +            return cpu;
> > +    }
> > +    BUG_ON(1);
> 
> Do we want to BUG() here?  I don't think this constitutes an
> unrecoverable error; an ASSERT_UNREACHABLE() plus something random
> would
> be better, wouldn't it?
> 
ASSERT_UNREACHABLE() is indeed much better. What do you mean with
"something random"? The value to be assigned to cpu?

Thanks and Regards,
Dario
George Dunlap July 25, 2017, 4:17 p.m. UTC | #4
On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>> On 06/16/2017 03:13 PM, Dario Faggioli wrote:
>>>
>>> diff --git a/xen/common/sched_credit2.c
>>> b/xen/common/sched_credit2.c
>>> index c749d4e..54f6e21 100644
>>> --- a/xen/common/sched_credit2.c
>>> +++ b/xen/common/sched_credit2.c
>>> @@ -537,36 +537,71 @@ void smt_idle_mask_clear(unsigned int cpu,
>>> cpumask_t *mask)
>>>  }
>>>  
>>>  /*
>>> - * When a hard affinity change occurs, we may not be able to check
>>> some
>>> - * (any!) of the other runqueues, when looking for the best new
>>> processor
>>> - * for svc (as trylock-s in csched2_cpu_pick() can fail). If that
>>> happens, we
>>> - * pick, in order of decreasing preference:
>>> - *  - svc's current pcpu;
>>> - *  - another pcpu from svc's current runq;
>>> - *  - any cpu.
>>> + * In csched2_cpu_pick(), it may not be possible to actually look
>>> at remote
>>> + * runqueues (the trylock-s on their spinlocks can fail!). If that
>>> happens,
>>> + * we pick, in order of decreasing preference:
>>> + *  1) svc's current pcpu, if it is part of svc's soft affinity;
>>> + *  2) a pcpu in svc's current runqueue that is also in svc's soft
>>> affinity;
>>> + *  3) just one valid pcpu from svc's soft affinity;
>>> + *  4) svc's current pcpu, if it is part of svc's hard affinity;
>>> + *  5) a pcpu in svc's current runqueue that is also in svc's hard
>>> affinity;
>>> + *  6) just one valid pcpu from svc's hard affinity
>>> + *
>>> + * Of course, 1, 2 and 3 makes sense only if svc has a soft
>>> affinity. Also
>>> + * note that at least 6 is guaranteed to _always_ return at least
>>> one pcpu.
>>>   */
>>>  static int get_fallback_cpu(struct csched2_vcpu *svc)
>>>  {
>>>      struct vcpu *v = svc->vcpu;
>>> -    int cpu = v->processor;
>>> +    unsigned int bs;
>>>  
>>> -    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
>>> -                cpupool_domain_cpumask(v->domain));
>>> +    for_each_affinity_balance_step( bs )
>>> +    {
>>> +        int cpu = v->processor;
>>> +
>>> +        if ( bs == BALANCE_SOFT_AFFINITY &&
>>> +             !has_soft_affinity(v, v->cpu_hard_affinity) )
>>> +            continue;
>>>  
>>> -    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
>>> -        return cpu;
>>> +        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
>>> +        cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                    cpupool_domain_cpumask(v->domain));
>>>  
>>> -    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> -                                   &svc->rqd->active)) )
>>> -    {
>>> -        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
>>> -                    cpumask_scratch_cpu(cpu));
>>> -        return cpumask_first(cpumask_scratch_cpu(cpu));
>>> -    }
>>> +        /*
>>> +         * This is cases 1 or 4 (depending on bs): if v->processor 
>>> is (still)
>>> +         * in our affinity, go for it, for cache betterness.
>>> +         */
>>> +        if ( likely(cpumask_test_cpu(cpu,
>>> cpumask_scratch_cpu(cpu))) )
>>> +            return cpu;
>>>  
>>> -    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
>>> +        /*
>>> +         * This is cases 2 or 5 (depending on bs): v->processor
>>> isn't there
>>> +         * any longer, check if we at least can stay in our
>>> current runq.
>>> +         */
>>> +        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
>>> +                                       &svc->rqd->active)) )
>>> +        {
>>> +            cpumask_and(cpumask_scratch_cpu(cpu),
>>> cpumask_scratch_cpu(cpu),
>>> +                        &svc->rqd->active);
>>> +            return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        }
>>>  
>>> -    return cpumask_first(cpumask_scratch_cpu(cpu));
>>> +        /*
>>> +         * This is cases 3 or 6 (depending on bs): last stand,
>>> just one valid
>>> +         * pcpu from our soft affinity, if we have one and if
>>> there's any. In
>>> +         * fact, if we are doing soft-affinity, it is possible
>>> that we fail,
>>> +         * which means we stay in the loop and look for hard
>>> affinity. OTOH,
>>> +         * if we are at the hard-affinity balancing step, it's
>>> guaranteed that
>>> +         * there is at least one valid cpu, and therefore we are
>>> sure that we
>>> +         * return it, and never really exit the loop.
>>> +         */
>>> +        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
>>> +               bs == BALANCE_SOFT_AFFINITY);
>>> +        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
>>
>> So just checking my understanding here... at this point we're not
>> taking
>> into consideration load or idleness or anything else -- we're just
>> saying, "Is there a cpu in my soft affinity it is *possible* to run
>> on?"
>>
> Exactly. If we are in this function, it means we failed to take the
> locks we needed, for making a choice based on load, idleness, etc, but
> we need a CPU, so we pick whatever is valid.
> 
> For choosing among all the valid ones, we act how it is explained in
> the comment.
> 
>>  So on a properly configured system, we should never take the second
>> iteration of the loop?
>>
> Mmm.. I think you're right. In fact, in a properly configured system,
> we'll never go past step 3 (from the comment at the top).
> 
> Which is not ideal, or at least not what I had in mind. In fact, I
> think it's better to check step 4 (svc->vcpu->processor in hard-
> affinity) and step 5 (a CPU from svc's runqueue in hard affinity), as
> that would mean avoiding a runqueue migration.
> 
> What about I basically kill step 3, i.e., if we reach this point during
> the soft-affinity step, I just continue to the hard-affinity one?

Hmm, well *normally* we would rather have a vcpu running within its soft
affinity, even if that means moving it to another runqueue.  Is your
idea that, the only reason we're in this particular code is because we
couldn't grab the lock we need to make a more informed decision; so
defer if possible to previous decisions, which (we might presume) was
able to make a more informed decision?

>>> +        if ( likely(cpu < nr_cpu_ids) )
>>> +            return cpu;
>>> +    }
>>> +    BUG_ON(1);
>>
>> Do we want to BUG() here?  I don't think this constitutes an
>> unrecoverable error; an ASSERT_UNREACHABLE() plus something random
>> would
>> be better, wouldn't it?
>>
> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> "something random"? The value to be assigned to cpu?

Er, yes, I meant the return value.  Returning 0, or v->processor would
be simple options.  *Really* defensive programming would attempt to
chose something somewhat sensible with the minimal risk of triggering
some other hidden assumptions (say, any cpu on our current runqueue).
But part of me says even thinking too long about it is a waste of time
for something we're 99.99% sure can never happen. :-)

 -George
Dario Faggioli July 25, 2017, 4:47 p.m. UTC | #5
On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
> > On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
> > > 
> > Mmm.. I think you're right. In fact, in a properly configured
> > system,
> > we'll never go past step 3 (from the comment at the top).
> > 
> > Which is not ideal, or at least not what I had in mind. In fact, I
> > think it's better to check step 4 (svc->vcpu->processor in hard-
> > affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
> > as
> > that would mean avoiding a runqueue migration.
> > 
> > What about I basically kill step 3, i.e., if we reach this point
> > during
> > the soft-affinity step, I just continue to the hard-affinity one?
> 
> Hmm, well *normally* we would rather have a vcpu running within its
> soft
> affinity, even if that means moving it to another runqueue.  
>
Yes, but both *ideally* and *normally*, we just should not be here. :-)

If we did end up here, we're in guessing territory, and, although what
you say about a guest wanting to run on within its soft-affinity is
always true, from the guest own point of view, our job as the scheduler
is to do what would be best for the system as a whole. But we are in a
situation where we could not gather the information to make such a
decision.

> Is your
> idea that, the only reason we're in this particular code is because
> we
> couldn't grab the lock we need to make a more informed decision; so
> defer if possible to previous decisions, which (we might presume) was
> able to make a more informed decision?
> 
Kind of, yes. Basically I think we should "escape" from this situation
as quickly as possible, and causing as few troubles as possible to both
ourself and to others, in the hope that it will go better next time.

Trying to stay in the same runqueue seems to me to fit this
requirement, as:
- as you say, we're here because a previous (presumably well informed)
  decision brought us here so, hopefully, staying here is not too bad, 
  neither for us nor overall;
- staying here is quicker and means less overhead for svc;
- staying here means less overhead overall. In fact, if we decide to 
  change runqueue, we will have to take the remote runqueue lock at 
  some point... And I'd prefer that to be for good reasons.

All that being said, it probably would be good to add a performance
counter, and try to get a sense of how frequently we actually end up in
this function as a fallback.

But in the meantime, yes, I'd try to make svc stay in the runqueue
where it is, in this case, if possible.

> > ASSERT_UNREACHABLE() is indeed much better. What do you mean with
> > "something random"? The value to be assigned to cpu?
> 
> Er, yes, I meant the return value.  Returning 0, or v->processor
> would
> be simple options.  *Really* defensive programming would attempt to
> chose something somewhat sensible with the minimal risk of triggering
> some other hidden assumptions (say, any cpu on our current runqueue).
> But part of me says even thinking too long about it is a waste of
> time
> for something we're 99.99% sure can never happen. :-)
> 
Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
either v->processor (with a comment), or a cpumask_any(something). Of
course the latter is expensive, but it should not be a big problem,
considering we'll never get there (I'll have a look at generated the
assembly, to confirm that).

Regards,
Dario
George Dunlap July 25, 2017, 4:52 p.m. UTC | #6
On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> On Tue, 2017-07-25 at 17:17 +0100, George Dunlap wrote:
>> On 07/25/2017 05:00 PM, Dario Faggioli wrote:
>>> On Tue, 2017-07-25 at 11:19 +0100, George Dunlap wrote:
>>>>
>>> Mmm.. I think you're right. In fact, in a properly configured
>>> system,
>>> we'll never go past step 3 (from the comment at the top).
>>>
>>> Which is not ideal, or at least not what I had in mind. In fact, I
>>> think it's better to check step 4 (svc->vcpu->processor in hard-
>>> affinity) and step 5 (a CPU from svc's runqueue in hard affinity),
>>> as
>>> that would mean avoiding a runqueue migration.
>>>
>>> What about I basically kill step 3, i.e., if we reach this point
>>> during
>>> the soft-affinity step, I just continue to the hard-affinity one?
>>
>> Hmm, well *normally* we would rather have a vcpu running within its
>> soft
>> affinity, even if that means moving it to another runqueue.  
>>
> Yes, but both *ideally* and *normally*, we just should not be here. :-)
> 
> If we did end up here, we're in guessing territory, and, although what
> you say about a guest wanting to run on within its soft-affinity is
> always true, from the guest own point of view, our job as the scheduler
> is to do what would be best for the system as a whole. But we are in a
> situation where we could not gather the information to make such a
> decision.
> 
>> Is your
>> idea that, the only reason we're in this particular code is because
>> we
>> couldn't grab the lock we need to make a more informed decision; so
>> defer if possible to previous decisions, which (we might presume) was
>> able to make a more informed decision?
>>
> Kind of, yes. Basically I think we should "escape" from this situation
> as quickly as possible, and causing as few troubles as possible to both
> ourself and to others, in the hope that it will go better next time.
> 
> Trying to stay in the same runqueue seems to me to fit this
> requirement, as:
> - as you say, we're here because a previous (presumably well informed)
>   decision brought us here so, hopefully, staying here is not too bad, 
>   neither for us nor overall;
> - staying here is quicker and means less overhead for svc;
> - staying here means less overhead overall. In fact, if we decide to 
>   change runqueue, we will have to take the remote runqueue lock at 
>   some point... And I'd prefer that to be for good reasons.
> 
> All that being said, it probably would be good to add a performance
> counter, and try to get a sense of how frequently we actually end up in
> this function as a fallback.
> 
> But in the meantime, yes, I'd try to make svc stay in the runqueue
> where it is, in this case, if possible.

Sounds good.  So are you going to respin the series then?

> 
>>> ASSERT_UNREACHABLE() is indeed much better. What do you mean with
>>> "something random"? The value to be assigned to cpu?
>>
>> Er, yes, I meant the return value.  Returning 0, or v->processor
>> would
>> be simple options.  *Really* defensive programming would attempt to
>> chose something somewhat sensible with the minimal risk of triggering
>> some other hidden assumptions (say, any cpu on our current runqueue).
>> But part of me says even thinking too long about it is a waste of
>> time
>> for something we're 99.99% sure can never happen. :-)
>>
> Agreed. IAC, I'll go for ASSERT_UNREACHABLE() and then see about using
> either v->processor (with a comment), or a cpumask_any(something). Of
> course the latter is expensive, but it should not be a big problem,
> considering we'll never get there (I'll have a look at generated the
> assembly, to confirm that).

OK, thanks.

 -George
Dario Faggioli July 25, 2017, 5:30 p.m. UTC | #7
On Tue, 2017-07-25 at 17:52 +0100, George Dunlap wrote:
> On 07/25/2017 05:47 PM, Dario Faggioli wrote:
> > 
> > All that being said, it probably would be good to add a performance
> > counter, and try to get a sense of how frequently we actually end
> > up in
> > this function as a fallback.
> > 
> > But in the meantime, yes, I'd try to make svc stay in the runqueue
> > where it is, in this case, if possible.
> 
> Sounds good.  So are you going to respin the series then?
> 
Yep, I'll rebase and respin this series. And then rebase the caps
series on top of this and respin it too.

Dario
diff mbox

Patch

diff --git a/xen/common/sched_credit2.c b/xen/common/sched_credit2.c
index c749d4e..54f6e21 100644
--- a/xen/common/sched_credit2.c
+++ b/xen/common/sched_credit2.c
@@ -537,36 +537,71 @@  void smt_idle_mask_clear(unsigned int cpu, cpumask_t *mask)
 }
 
 /*
- * When a hard affinity change occurs, we may not be able to check some
- * (any!) of the other runqueues, when looking for the best new processor
- * for svc (as trylock-s in csched2_cpu_pick() can fail). If that happens, we
- * pick, in order of decreasing preference:
- *  - svc's current pcpu;
- *  - another pcpu from svc's current runq;
- *  - any cpu.
+ * In csched2_cpu_pick(), it may not be possible to actually look at remote
+ * runqueues (the trylock-s on their spinlocks can fail!). If that happens,
+ * we pick, in order of decreasing preference:
+ *  1) svc's current pcpu, if it is part of svc's soft affinity;
+ *  2) a pcpu in svc's current runqueue that is also in svc's soft affinity;
+ *  3) just one valid pcpu from svc's soft affinity;
+ *  4) svc's current pcpu, if it is part of svc's hard affinity;
+ *  5) a pcpu in svc's current runqueue that is also in svc's hard affinity;
+ *  6) just one valid pcpu from svc's hard affinity
+ *
+ * Of course, 1, 2 and 3 makes sense only if svc has a soft affinity. Also
+ * note that at least 6 is guaranteed to _always_ return at least one pcpu.
  */
 static int get_fallback_cpu(struct csched2_vcpu *svc)
 {
     struct vcpu *v = svc->vcpu;
-    int cpu = v->processor;
+    unsigned int bs;
 
-    cpumask_and(cpumask_scratch_cpu(cpu), v->cpu_hard_affinity,
-                cpupool_domain_cpumask(v->domain));
+    for_each_affinity_balance_step( bs )
+    {
+        int cpu = v->processor;
+
+        if ( bs == BALANCE_SOFT_AFFINITY &&
+             !has_soft_affinity(v, v->cpu_hard_affinity) )
+            continue;
 
-    if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
-        return cpu;
+        affinity_balance_cpumask(v, bs, cpumask_scratch_cpu(cpu));
+        cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                    cpupool_domain_cpumask(v->domain));
 
-    if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
-                                   &svc->rqd->active)) )
-    {
-        cpumask_and(cpumask_scratch_cpu(cpu), &svc->rqd->active,
-                    cpumask_scratch_cpu(cpu));
-        return cpumask_first(cpumask_scratch_cpu(cpu));
-    }
+        /*
+         * This is cases 1 or 4 (depending on bs): if v->processor is (still)
+         * in our affinity, go for it, for cache betterness.
+         */
+        if ( likely(cpumask_test_cpu(cpu, cpumask_scratch_cpu(cpu))) )
+            return cpu;
 
-    ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)));
+        /*
+         * This is cases 2 or 5 (depending on bs): v->processor isn't there
+         * any longer, check if we at least can stay in our current runq.
+         */
+        if ( likely(cpumask_intersects(cpumask_scratch_cpu(cpu),
+                                       &svc->rqd->active)) )
+        {
+            cpumask_and(cpumask_scratch_cpu(cpu), cpumask_scratch_cpu(cpu),
+                        &svc->rqd->active);
+            return cpumask_first(cpumask_scratch_cpu(cpu));
+        }
 
-    return cpumask_first(cpumask_scratch_cpu(cpu));
+        /*
+         * This is cases 3 or 6 (depending on bs): last stand, just one valid
+         * pcpu from our soft affinity, if we have one and if there's any. In
+         * fact, if we are doing soft-affinity, it is possible that we fail,
+         * which means we stay in the loop and look for hard affinity. OTOH,
+         * if we are at the hard-affinity balancing step, it's guaranteed that
+         * there is at least one valid cpu, and therefore we are sure that we
+         * return it, and never really exit the loop.
+         */
+        ASSERT(!cpumask_empty(cpumask_scratch_cpu(cpu)) ||
+               bs == BALANCE_SOFT_AFFINITY);
+        cpu = cpumask_first(cpumask_scratch_cpu(cpu));
+        if ( likely(cpu < nr_cpu_ids) )
+            return cpu;
+    }
+    BUG_ON(1);
 }
 
 /*