xen: idle_loop: either deal with tasklets or go idle

Message ID	1497609841.30417.5.camel@citrix.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> Message-ID: <1497609841.30417.5.camel@citrix.com> From: Dario Faggioli <dario.faggioli@citrix.com> To: Jan Beulich <JBeulich@suse.com> Date: Fri, 16 Jun 2017 12:44:01 +0200 In-Reply-To: <5943B8F202000078001635CC@prv-mh.provo.novell.com> References: <149745892779.20244.4770433880444010417.stgit@Solace.fritz.box> <149745919711.20244.17843343131079129783.stgit@Solace.fritz.box> <5943B8F202000078001635CC@prv-mh.provo.novell.com> Organization: Citrix Inc. MIME-Version: 1.0 Cc: Andrew Cooper <andrew.cooper3@citrix.com>, Julien Grall <julien.grall@arm.com>, Stefano Stabellini <sstabellini@kernel.org>, Boris Ostrovsky <boris.ostrovsky@oracle.com>, xen-devel@lists.xenproject.org Subject: Re: [Xen-devel] [PATCH] xen: idle_loop: either deal with tasklets or go idle Precedence: list Content-Type: multipart/mixed; boundary="===============5642683510594127095==" Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Dario Faggioli June 16, 2017, 10:44 a.m. UTC

On Fri, 2017-06-16 at 02:54 -0600, Jan Beulich wrote:
> > > > On 14.06.17 at 18:53, <dario.faggioli@citrix.com> wrote:
> > 
> > --- a/xen/arch/x86/domain.c
> > +++ b/xen/arch/x86/domain.c
> > @@ -112,12 +112,18 @@ static void play_dead(void)
> >  
> >  static void idle_loop(void)
> >  {
> > +    unsigned int cpu = smp_processor_id();
> > +
> >      for ( ; ; )
> >      {
> > -        if ( cpu_is_offline(smp_processor_id()) )
> > +        if ( cpu_is_offline(cpu) )
> >              play_dead();
> > -        (*pm_idle)();
> > -        do_tasklet();
> > +
> > +        /* Are we here for running vcpu context tasklets, or for
> > idling? */
> > +        if ( unlikely(tasklet_work_to_do(cpu)) )
> 
> I'm not really sure about the "unlikely()" here.
> 
It's basically already there, without this patch, at the very beginning
of do_tasklet():

    if ( likely(*work_to_do != (TASKLET_enqueued|TASKLET_scheduled)) )
        return;

Which is right the check that I moved in tasklet_work_to_do(), and as
you can see, it has the likely.

So, I fundamentally kept it for consistency with old code. I actually
think it does make sense, but I don't have a too strong opinion about
this.

> > +            do_tasklet(cpu);
> > +        else
> > +            (*pm_idle)();
> 
> Please take the opportunity and drop the pointless parentheses
> and indirection.
> 
Ok.

> > --- a/xen/common/tasklet.c
> > +++ b/xen/common/tasklet.c
> > @@ -104,19 +104,11 @@ static void do_tasklet_work(unsigned int cpu,
> > struct list_head *list)
> >  }
> >  
> >  /* VCPU context work */
> > -void do_tasklet(void)
> > +void do_tasklet(unsigned int cpu)
> >  {
> > -    unsigned int cpu = smp_processor_id();
> 
> I'm not convinced it is a good idea to have the caller pass in the
> CPU
> number. 
>
Yes, I know. I couldn't make up my mind about it either. I guess I get
get rid of this aspect of the patch.

> >      unsigned long *work_to_do = &per_cpu(tasklet_work_to_do, cpu);
> >      struct list_head *list = &per_cpu(tasklet_list, cpu);
> >  
> > -    /*
> > -     * Work must be enqueued *and* scheduled. Otherwise there is
> > no work to
> > -     * do, and/or scheduler needs to run to update idle vcpu
> > priority.
> > -     */
> > -    if ( likely(*work_to_do !=
> > (TASKLET_enqueued|TASKLET_scheduled)) )
> > -        return;
> 
> Perhaps it also wouldn't hurt to convert this to an ASSERT() too.
> 
Nope, I can't. It's a best effort check, and *work_to_do (which is
per_cpu(tasklet_work_to_do,cpu)) can change, and the assert may fail.

The code is, of course, safe, because, if we think that there's work
but there's not, the list of pending tasklets will be empty (and we
check that after taking the tasklet lock).

> > --- a/xen/include/xen/tasklet.h
> > +++ b/xen/include/xen/tasklet.h
> > @@ -40,9 +40,19 @@ DECLARE_PER_CPU(unsigned long,
> > tasklet_work_to_do);
> >  #define TASKLET_enqueued   (1ul << _TASKLET_enqueued)
> >  #define TASKLET_scheduled  (1ul << _TASKLET_scheduled)
> >  
> > +static inline bool tasklet_work_to_do(unsigned int cpu)
> > +{
> > +    /*
> > +     * Work must be enqueued *and* scheduled. Otherwise there is
> > no work to
> > +     * do, and/or scheduler needs to run to update idle vcpu
> > priority.
> > +     */
> > +    return per_cpu(tasklet_work_to_do, cpu) == (TASKLET_enqueued|
> > +                                                TASKLET_scheduled)
> > ;
> > +}
> 
> Wouldn't cpu_is_haltable() then also better use this new function?
> 
Mmm... Perhaps. It's certainly less code chrun.

ARM code would then contain two invocations of cpu_is_haltable() (the
first happens with IRQ enabled, so a second one with IRQ disabled is
necessary). But that is *exactly* the same thing we do on x86 (they're
just in different functions in that case).

So, I reworked the patch according to these suggestions, and you can
look at it below.

If you like it better, I'm ok re-submitting it properly in this shape.
Other thoughts anyone else?

Thanks and Regards,
Dario
---
NOTE that, since we call do_tasklet() after having checked
cpu_is_haltable(), the if in there is not likely any longer.
---

Jan Beulich June 16, 2017, 11:47 a.m. UTC | #1

>>> On 16.06.17 at 12:44, <dario.faggioli@citrix.com> wrote:
> On Fri, 2017-06-16 at 02:54 -0600, Jan Beulich wrote:
>> > > > On 14.06.17 at 18:53, <dario.faggioli@citrix.com> wrote:
>> > 
>> > --- a/xen/arch/x86/domain.c
>> > +++ b/xen/arch/x86/domain.c
>> > @@ -112,12 +112,18 @@ static void play_dead(void)
>> >  
>> >  static void idle_loop(void)
>> >  {
>> > +    unsigned int cpu = smp_processor_id();
>> > +
>> >      for ( ; ; )
>> >      {
>> > -        if ( cpu_is_offline(smp_processor_id()) )
>> > +        if ( cpu_is_offline(cpu) )
>> >              play_dead();
>> > -        (*pm_idle)();
>> > -        do_tasklet();
>> > +
>> > +        /* Are we here for running vcpu context tasklets, or for
>> > idling? */
>> > +        if ( unlikely(tasklet_work_to_do(cpu)) )
>> 
>> I'm not really sure about the "unlikely()" here.
>> 
> It's basically already there, without this patch, at the very beginning
> of do_tasklet():
> 
>     if ( likely(*work_to_do != (TASKLET_enqueued|TASKLET_scheduled)) )
>         return;
> 
> Which is right the check that I moved in tasklet_work_to_do(), and as
> you can see, it has the likely.
> 
> So, I fundamentally kept it for consistency with old code. I actually
> think it does make sense, but I don't have a too strong opinion about
> this.

Okay then.

>> >      unsigned long *work_to_do = &per_cpu(tasklet_work_to_do, cpu);
>> >      struct list_head *list = &per_cpu(tasklet_list, cpu);
>> >  
>> > -    /*
>> > -     * Work must be enqueued *and* scheduled. Otherwise there is
>> > no work to
>> > -     * do, and/or scheduler needs to run to update idle vcpu
>> > priority.
>> > -     */
>> > -    if ( likely(*work_to_do !=
>> > (TASKLET_enqueued|TASKLET_scheduled)) )
>> > -        return;
>> 
>> Perhaps it also wouldn't hurt to convert this to an ASSERT() too.
>> 
> Nope, I can't. It's a best effort check, and *work_to_do (which is
> per_cpu(tasklet_work_to_do,cpu)) can change, and the assert may fail.

How that? TASKLET_enqueued can only be cleared by do_tasklet()
itself, and I don't think nesting invocations of the function can or
should occur. TASKLET_scheduled is only being cleared when
schedule() observes that bit set without TASKLET_enqueued also
set. IOW there may be races in setting of the bits, but since we
expect the caller to have done this check already, I think an
ASSERT() would be quite fine here.

>> > --- a/xen/include/xen/tasklet.h
>> > +++ b/xen/include/xen/tasklet.h
>> > @@ -40,9 +40,19 @@ DECLARE_PER_CPU(unsigned long,
>> > tasklet_work_to_do);
>> >  #define TASKLET_enqueued   (1ul << _TASKLET_enqueued)
>> >  #define TASKLET_scheduled  (1ul << _TASKLET_scheduled)
>> >  
>> > +static inline bool tasklet_work_to_do(unsigned int cpu)
>> > +{
>> > +    /*
>> > +     * Work must be enqueued *and* scheduled. Otherwise there is
>> > no work to
>> > +     * do, and/or scheduler needs to run to update idle vcpu
>> > priority.
>> > +     */
>> > +    return per_cpu(tasklet_work_to_do, cpu) == (TASKLET_enqueued|
>> > +                                                TASKLET_scheduled)
>> > ;
>> > +}
>> 
>> Wouldn't cpu_is_haltable() then also better use this new function?
>> 
> Mmm... Perhaps. It's certainly less code chrun.
> 
> ARM code would then contain two invocations of cpu_is_haltable() (the
> first happens with IRQ enabled, so a second one with IRQ disabled is
> necessary). But that is *exactly* the same thing we do on x86 (they're
> just in different functions in that case).
> 
> So, I reworked the patch according to these suggestions, and you can
> look at it below.

I'm confused: You've added further uses of cpu_is_haltable()
where the cheaper tasklet_work_to_do() would do. That's sort
of the opposite of what I've suggested. Of course that suggestion
of mine was more than 1:1 replacement - the implied question was
whether cpu_is_haltable() simply checking tasklet_work_to_do to
be non-zero isn't too lax (and wouldn't better be
!tasklet_work_to_do()).

Jan

Dario Faggioli June 16, 2017, 1:34 p.m. UTC | #2

On Fri, 2017-06-16 at 05:47 -0600, Jan Beulich wrote:
> > On Fri, 2017-06-16 at 02:54 -0600, Jan Beulich wrote:
> > > > > > On 14.06.17 at 18:53, <dario.faggioli@citrix.com> wrote:
> > > >      unsigned long *work_to_do = &per_cpu(tasklet_work_to_do,
> > > > cpu);
> > > >      struct list_head *list = &per_cpu(tasklet_list, cpu);
> > > >  
> > > > -    /*
> > > > -     * Work must be enqueued *and* scheduled. Otherwise there
> > > > is
> > > > no work to
> > > > -     * do, and/or scheduler needs to run to update idle vcpu
> > > > priority.
> > > > -     */
> > > > -    if ( likely(*work_to_do !=
> > > > (TASKLET_enqueued|TASKLET_scheduled)) )
> > > > -        return;
> > > 
> > > Perhaps it also wouldn't hurt to convert this to an ASSERT() too.
> > > 
> > 
> > Nope, I can't. It's a best effort check, and *work_to_do (which is
> > per_cpu(tasklet_work_to_do,cpu)) can change, and the assert may
> > fail.
> 
> How that? TASKLET_enqueued can only be cleared by do_tasklet()
> itself, and I don't think nesting invocations of the function can or
> should occur. TASKLET_scheduled is only being cleared when
> schedule() observes that bit set without TASKLET_enqueued also
> set. IOW there may be races in setting of the bits, but since we
> expect the caller to have done this check already, I think an
> ASSERT() would be quite fine here.
> 
Ok, makes sense. I will add the ASSERT() (with something like what you
wrote here as a comment).

> > > Wouldn't cpu_is_haltable() then also better use this new
> > > function?
> > > 
> > 
> > Mmm... Perhaps. It's certainly less code chrun.
> > 
> > ARM code would then contain two invocations of cpu_is_haltable()
> > (the
> > first happens with IRQ enabled, so a second one with IRQ disabled
> > is
> > necessary). But that is *exactly* the same thing we do on x86
> > (they're
> > just in different functions in that case).
> > 
> > So, I reworked the patch according to these suggestions, and you
> > can
> > look at it below.
> 
> I'm confused: You've added further uses of cpu_is_haltable()
> where the cheaper tasklet_work_to_do() would do.
>
Indeed. Sorry!

Fact is, I've read your comment backwards, i.e., as you were saying
something like "wouldn't cpu_is_haltable() better be used here, instead
of this new function?"

And it's not that your wording is ambiguous, it's me that, apparently,
can't read English! :-/

I'll rework the patch again...

> Of course that suggestion
> of mine was more than 1:1 replacement - the implied question was
> whether cpu_is_haltable() simply checking tasklet_work_to_do to
> be non-zero isn't too lax (and wouldn't better be
> !tasklet_work_to_do()).
> 
Now, to try to answer the real question...

Let's assume that, on cpu x, we are about to check cpu_is_haltable(),
while, on cpu y, tasklet_schedule_on_cpu(x) is being called, and
manages to set _TASKLET_enqueued in *work_to_do.

I.e., in current code:

 CPU x                           CPU y
 |                               |
 cpu_is_haltable(x)              tasklet_schedule_on_cpu(x)
 |!softirq_pending(x) == true    tasklet_enqueue()
 |cpu_online(x) == true          test_and_set(TASKLET_enqueued,
 |                                            work_to_do)
 |!work_to_do == FALSE

So we don't go to sleep, and we stay in the idle loop for the next
iteration. At which point, CPU y will have raised SCHEDULE_SOFTIRQ on
x, schedule (still on x) will set TASKLET_scheduled, and we'll call
do_tasklet().

Basically, right now, we risk spinning for the time that passes between
TASKLET_enqueued being set and SCHEDULE_SOFTIRQ being raised and
reaching cpu x. This should be a very short window, and, considering
how the TASKLET_* flags are handled, this looks the correct behavior to
me.

If we use !tasklet_work_to_do() in cpu_is_haltable():

 CPU x                               CPU y
 |                                   |
 cpu_is_haltable(x)                  tasklet_schedule_on_cpu(x)
 |!softirq_pending(x) == true        tasklet_enqueue()
 |cpu_online(x) == true              test_and_set(TASKLET_enqueued,
 |                                              work_to_do)
 |!(work_to_do == TASKLET_enqueued+
                  TASKLET_scheduled) == TRUE

Which means we'd go to sleep... just for (most likely) be woken up very
very soon by SCHEDULE_SOFTIRQ being thrown at us (cpu x) by cpu y.

Am I overlooking anything? And is this (this time) what you were
asking?

Assuming answers are 'no' and 'yes', I think I'd leave
cpu_is_haltable() as it is (perhaps adding a brief comment on why it's
ok/better to check work_to_do directly, instead than calling
tasklet_work_to_do()).

Sorry again for the misunderstanding,
Dario

Jan Beulich June 16, 2017, 2:09 p.m. UTC | #3

>>> On 16.06.17 at 15:34, <dario.faggioli@citrix.com> wrote:
> Assuming answers are 'no' and 'yes', I think I'd leave
> cpu_is_haltable() as it is

Agreed, you analysis was quite helpful.

Jan

Stefano Stabellini June 16, 2017, 5:41 p.m. UTC | #4

On Fri, 16 Jun 2017, Dario Faggioli wrote:
> On Fri, 2017-06-16 at 02:54 -0600, Jan Beulich wrote:
> > > > > On 14.06.17 at 18:53, <dario.faggioli@citrix.com> wrote:
> > > 
> > > --- a/xen/arch/x86/domain.c
> > > +++ b/xen/arch/x86/domain.c
> > > @@ -112,12 +112,18 @@ static void play_dead(void)
> > >  
> > >  static void idle_loop(void)
> > >  {
> > > +    unsigned int cpu = smp_processor_id();
> > > +
> > >      for ( ; ; )
> > >      {
> > > -        if ( cpu_is_offline(smp_processor_id()) )
> > > +        if ( cpu_is_offline(cpu) )
> > >              play_dead();
> > > -        (*pm_idle)();
> > > -        do_tasklet();
> > > +
> > > +        /* Are we here for running vcpu context tasklets, or for
> > > idling? */
> > > +        if ( unlikely(tasklet_work_to_do(cpu)) )
> > 
> > I'm not really sure about the "unlikely()" here.
> > 
> It's basically already there, without this patch, at the very beginning
> of do_tasklet():
> 
>     if ( likely(*work_to_do != (TASKLET_enqueued|TASKLET_scheduled)) )
>         return;
> 
> Which is right the check that I moved in tasklet_work_to_do(), and as
> you can see, it has the likely.
> 
> So, I fundamentally kept it for consistency with old code. I actually
> think it does make sense, but I don't have a too strong opinion about
> this.
> 
> > > +            do_tasklet(cpu);
> > > +        else
> > > +            (*pm_idle)();
> > 
> > Please take the opportunity and drop the pointless parentheses
> > and indirection.
> > 
> Ok.
> 
> > > --- a/xen/common/tasklet.c
> > > +++ b/xen/common/tasklet.c
> > > @@ -104,19 +104,11 @@ static void do_tasklet_work(unsigned int cpu,
> > > struct list_head *list)
> > >  }
> > >  
> > >  /* VCPU context work */
> > > -void do_tasklet(void)
> > > +void do_tasklet(unsigned int cpu)
> > >  {
> > > -    unsigned int cpu = smp_processor_id();
> > 
> > I'm not convinced it is a good idea to have the caller pass in the
> > CPU
> > number. 
> >
> Yes, I know. I couldn't make up my mind about it either. I guess I get
> get rid of this aspect of the patch.
> 
> > >      unsigned long *work_to_do = &per_cpu(tasklet_work_to_do, cpu);
> > >      struct list_head *list = &per_cpu(tasklet_list, cpu);
> > >  
> > > -    /*
> > > -     * Work must be enqueued *and* scheduled. Otherwise there is
> > > no work to
> > > -     * do, and/or scheduler needs to run to update idle vcpu
> > > priority.
> > > -     */
> > > -    if ( likely(*work_to_do !=
> > > (TASKLET_enqueued|TASKLET_scheduled)) )
> > > -        return;
> > 
> > Perhaps it also wouldn't hurt to convert this to an ASSERT() too.
> > 
> Nope, I can't. It's a best effort check, and *work_to_do (which is
> per_cpu(tasklet_work_to_do,cpu)) can change, and the assert may fail.
> 
> The code is, of course, safe, because, if we think that there's work
> but there's not, the list of pending tasklets will be empty (and we
> check that after taking the tasklet lock).
> 
> > > --- a/xen/include/xen/tasklet.h
> > > +++ b/xen/include/xen/tasklet.h
> > > @@ -40,9 +40,19 @@ DECLARE_PER_CPU(unsigned long,
> > > tasklet_work_to_do);
> > >  #define TASKLET_enqueued   (1ul << _TASKLET_enqueued)
> > >  #define TASKLET_scheduled  (1ul << _TASKLET_scheduled)
> > >  
> > > +static inline bool tasklet_work_to_do(unsigned int cpu)
> > > +{
> > > +    /*
> > > +     * Work must be enqueued *and* scheduled. Otherwise there is
> > > no work to
> > > +     * do, and/or scheduler needs to run to update idle vcpu
> > > priority.
> > > +     */
> > > +    return per_cpu(tasklet_work_to_do, cpu) == (TASKLET_enqueued|
> > > +                                                TASKLET_scheduled)
> > > ;
> > > +}
> > 
> > Wouldn't cpu_is_haltable() then also better use this new function?
> > 
> Mmm... Perhaps. It's certainly less code chrun.
> 
> ARM code would then contain two invocations of cpu_is_haltable() (the
> first happens with IRQ enabled, so a second one with IRQ disabled is
> necessary). But that is *exactly* the same thing we do on x86 (they're
> just in different functions in that case).
> 
> So, I reworked the patch according to these suggestions, and you can
> look at it below.
> 
> If you like it better, I'm ok re-submitting it properly in this shape.
> Other thoughts anyone else?
> 
> Thanks and Regards,
> Dario
> ---
> NOTE that, since we call do_tasklet() after having checked
> cpu_is_haltable(), the if in there is not likely any longer.
> ---
> diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> index 76310ed..86cd612 100644
> --- a/xen/arch/arm/domain.c
> +++ b/xen/arch/arm/domain.c
> @@ -41,20 +41,28 @@ DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
>  
>  void idle_loop(void)
>  {
> +    unsigned int cpu = smp_processor_id();
> +
>      for ( ; ; )
>      {
> -        if ( cpu_is_offline(smp_processor_id()) )
> +        if ( cpu_is_offline(cpu) )
>              stop_cpu();
>  
> -        local_irq_disable();
> -        if ( cpu_is_haltable(smp_processor_id()) )
> +        /* Are we here for running vcpu context tasklets, or for idling? */
> +        if ( cpu_is_haltable(cpu) )
>          {
> -            dsb(sy);
> -            wfi();
> +            local_irq_disable();
> +            /* We need to check again, with IRQ disabled */
> +            if ( cpu_is_haltable(cpu) )
> +            {
> +                dsb(sy);
> +                wfi();
> +            }
> +            local_irq_enable();
>          }
> -        local_irq_enable();
> +        else
> +            do_tasklet();
>  
> -        do_tasklet();
>          do_softirq();

Are you sure you want to check that cpu_is_haltable twice? It doesn't
make sense to me.

Dario Faggioli June 16, 2017, 8:06 p.m. UTC | #5

On Fri, 2017-06-16 at 10:41 -0700, Stefano Stabellini wrote:
> On Fri, 16 Jun 2017, Dario Faggioli wrote:
> > 
> > diff --git a/xen/arch/arm/domain.c b/xen/arch/arm/domain.c
> > index 76310ed..86cd612 100644
> > --- a/xen/arch/arm/domain.c
> > +++ b/xen/arch/arm/domain.c
> > @@ -41,20 +41,28 @@ DEFINE_PER_CPU(struct vcpu *, curr_vcpu);
> >  
> >  void idle_loop(void)
> >  {
> > +    unsigned int cpu = smp_processor_id();
> > +
> >      for ( ; ; )
> >      {
> > -        if ( cpu_is_offline(smp_processor_id()) )
> > +        if ( cpu_is_offline(cpu) )
> >              stop_cpu();
> >  
> > -        local_irq_disable();
> > -        if ( cpu_is_haltable(smp_processor_id()) )
> > +        /* Are we here for running vcpu context tasklets, or for
> > idling? */
> > +        if ( cpu_is_haltable(cpu) )
> >          {
> > -            dsb(sy);
> > -            wfi();
> > +            local_irq_disable();
> > +            /* We need to check again, with IRQ disabled */
> > +            if ( cpu_is_haltable(cpu) )
> > +            {
> > +                dsb(sy);
> > +                wfi();
> > +            }
> > +            local_irq_enable();
> >          }
> > -        local_irq_enable();
> > +        else
> > +            do_tasklet();
> >  
> > -        do_tasklet();
> >          do_softirq();
> 
> Are you sure you want to check that cpu_is_haltable twice? It doesn't
> make sense to me.
>
It's because of IRQ being disabled the first time.

But anyway, discard this patch. I'll go back to (a slightly modified
version of) the first one I sent, which defines a tasklet specific
helper function.

I'll send it on Monday.

Regards,
Dario

xen: idle_loop: either deal with tasklets or go idle

Commit Message

Comments

Patch