diff mbox

[v2,7/7] ARM: smp: Add runtime PM support for CPU hotplug

Message ID 1441310314-8857-8-git-send-email-lina.iyer@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Lina Iyer Sept. 3, 2015, 7:58 p.m. UTC
Enable runtime PM for CPU devices. Do a runtime get of the CPU device
when the CPU is hotplugged in and a runtime put of the CPU device when
the CPU is hotplugged off. When all the CPUs in a domain are hotplugged
off, the domain may also be powered off and cluster_pm_enter/exit()
notifications are be sent out.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Lina Iyer <lina.iyer@linaro.org>
---
 arch/arm/kernel/smp.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

Comments

Stephen Boyd Sept. 4, 2015, 3:59 a.m. UTC | #1
On 09/03, Lina Iyer wrote:
> @@ -137,7 +138,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>  	}
>  
> -

Please remove noise.

>  	memset(&secondary_data, 0, sizeof(secondary_data));
>  	return ret;
>  }
> @@ -271,6 +271,16 @@ void __cpu_die(unsigned int cpu)
>  void __ref cpu_die(void)
>  {
>  	unsigned int cpu = smp_processor_id();
> +	struct device *cpu_dev;
> +
> +	/*
> +	 * We dont need the CPU device anymore.

s/dont/don't/

> +	 * Lets do this before IRQs are disabled to allow

s/Lets/Let's/

> +	 * runtime PM to suspend the domain as well.
> +	 */
Geert Uytterhoeven Sept. 4, 2015, 7:39 a.m. UTC | #2
On Thu, Sep 3, 2015 at 9:58 PM, Lina Iyer <lina.iyer@linaro.org> wrote:
> Enable runtime PM for CPU devices. Do a runtime get of the CPU device
> when the CPU is hotplugged in and a runtime put of the CPU device when

s/in/on/

> the CPU is hotplugged off. When all the CPUs in a domain are hotplugged
> off, the domain may also be powered off and cluster_pm_enter/exit()
> notifications are be sent out.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds
Russell King - ARM Linux Sept. 4, 2015, 9:17 a.m. UTC | #3
On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
> @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
>  	local_irq_enable();
>  	local_fiq_enable();
>  
> +	/* We are running, enable runtime PM for the CPU. */
> +	cpu_dev = get_cpu_device(cpu);
> +	if (cpu_dev)
> +		pm_runtime_get_sync(cpu_dev);
> +

Please no.  This is fragile.

pm_runtime_get_sync() can sleep, and this path is used when the system
is fully up and running.  Locks may be held, which cause this to sleep.
However, we're calling it from a context where we can't sleep - which
is the general rule for any part of system initialisation where we
have not entered the idle thread yet (the idle thread is what we run
when sleeping.)

NAK on this.
Russell King - ARM Linux Sept. 4, 2015, 9:27 a.m. UTC | #4
On Fri, Sep 04, 2015 at 10:17:35AM +0100, Russell King - ARM Linux wrote:
> On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
> > @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
> >  	local_irq_enable();
> >  	local_fiq_enable();
> >  
> > +	/* We are running, enable runtime PM for the CPU. */
> > +	cpu_dev = get_cpu_device(cpu);
> > +	if (cpu_dev)
> > +		pm_runtime_get_sync(cpu_dev);
> > +
> 
> Please no.  This is fragile.
> 
> pm_runtime_get_sync() can sleep, and this path is used when the system
> is fully up and running.  Locks may be held, which cause this to sleep.
> However, we're calling it from a context where we can't sleep - which
> is the general rule for any part of system initialisation where we
> have not entered the idle thread yet (the idle thread is what we run
> when sleeping.)
> 
> NAK on this.

There's another issue here.  This is obviously wrong.  If the CPU is
inside the PM domain, then don't you need the PM domain powered up for
the CPU to start executing instructions?

If the CPU can run instructions with the PM domain that it's allegedly
inside powered down, then the CPU is _not_ part of the PM domain, and
this whole thing is total crap.

Sorry, I'm really not impressed by the "design" here, it seems to be
totally screwed.
Lina Iyer Sept. 4, 2015, 3:12 p.m. UTC | #5
On Fri, Sep 04 2015 at 03:27 -0600, Russell King - ARM Linux wrote:
>On Fri, Sep 04, 2015 at 10:17:35AM +0100, Russell King - ARM Linux wrote:
>> On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
>> > @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
>> >  	local_irq_enable();
>> >  	local_fiq_enable();
>> >
>> > +	/* We are running, enable runtime PM for the CPU. */
>> > +	cpu_dev = get_cpu_device(cpu);
>> > +	if (cpu_dev)
>> > +		pm_runtime_get_sync(cpu_dev);
>> > +
>>
>> Please no.  This is fragile.
>>
>> pm_runtime_get_sync() can sleep, and this path is used when the system
>> is fully up and running.  Locks may be held, which cause this to sleep.
>> However, we're calling it from a context where we can't sleep - which
>> is the general rule for any part of system initialisation where we
>> have not entered the idle thread yet (the idle thread is what we run
>> when sleeping.)
>>
More explanation below.

Another patch (3/7) in this series defines CPU devices as IRQ safe and
therefore the dev->power.lock would be spinlock'd before calling the
rest of the PM runtime code and the domain. The CPU PM domain would also
be an IRQ safe domain (patch 2/7 makes the genpd framework usable in IRQ
safe context).

>> NAK on this.
>
>There's another issue here.  This is obviously wrong.  If the CPU is
>inside the PM domain, then don't you need the PM domain powered up for
>the CPU to start executing instructions?
>
This would not be the case for CPU PM domains. The CPU PM domains AFAIK,
are triggered by the same wake up interrupt that brings the CPU out of
its power downs state. The domain hardware has to be awake before the
CPU executes its first instruction. Thefore the hw power up is a logical
OR of the wakeup signals to any of the cores.

Similarly, the domain hardware would be the last to power down after all
the CPUs.  Generally tied to the execution of WFI command by the ARM CPU
core. The domain power down signal is a logical AND of all the WFI
signals of all the cores in the domain.

>If the CPU can run instructions with the PM domain that it's allegedly
>inside powered down, then the CPU is _not_ part of the PM domain, and
>this whole thing is total crap.
>
CPU PM domains have to be synchronized in hardware and all these can
only happen when the SoC provides a domain that can power on before any
of the CPU powers up and power down only after the all CPUs have powered
down.

What s/w generally can do in the CPU domain power on/off callbacks from
genpd is to configure which state the domain should be in, when all the
CPUs have powered down. Many SoCs, have PM domains that could be in
retention in anticipation of an early wakeup or could be completely
cache flushed and powered down, when suspending or sleeping for a longer
duration. The power off callbacks for the PM domain does the
configuration and it doesnt take effect until all the CPUs power down.

While powering on the domain in the genpd power on callback, all we do
is to ensure that the domain is reconfigured to remain active (ignore
all WFI instruction from the CPU, which are not tied to CPU power down.
Dont want the domain powering off when all CPUs in the domain are just
clock gated.) until the domain is configured to its idle state by the
last CPU in Linux.

>Sorry, I'm really not impressed by the "design" here, it seems to be
>totally screwed.
>

Thanks,
Lina
Lina Iyer Sept. 4, 2015, 3:13 p.m. UTC | #6
On Thu, Sep 03 2015 at 21:59 -0600, Stephen Boyd wrote:
>On 09/03, Lina Iyer wrote:
>> @@ -137,7 +138,6 @@ int __cpu_up(unsigned int cpu, struct task_struct *idle)
>>  		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
>>  	}
>>
>> -
>
>Please remove noise.
>
Sorry, I forgot to address it earlier. Thought I did.
Apologize.

>>  	memset(&secondary_data, 0, sizeof(secondary_data));
>>  	return ret;
>>  }
>> @@ -271,6 +271,16 @@ void __cpu_die(unsigned int cpu)
>>  void __ref cpu_die(void)
>>  {
>>  	unsigned int cpu = smp_processor_id();
>> +	struct device *cpu_dev;
>> +
>> +	/*
>> +	 * We dont need the CPU device anymore.
>
>s/dont/don't/
>
>> +	 * Lets do this before IRQs are disabled to allow
>
>s/Lets/Let's/
>
>> +	 * runtime PM to suspend the domain as well.
>> +	 */
>
>-- 
>Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
>a Linux Foundation Collaborative Project
Russell King - ARM Linux Sept. 4, 2015, 4:23 p.m. UTC | #7
On Fri, Sep 04, 2015 at 09:12:02AM -0600, Lina Iyer wrote:
> On Fri, Sep 04 2015 at 03:27 -0600, Russell King - ARM Linux wrote:
> >On Fri, Sep 04, 2015 at 10:17:35AM +0100, Russell King - ARM Linux wrote:
> >>On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
> >>> @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
> >>>  	local_irq_enable();
> >>>  	local_fiq_enable();
> >>>
> >>> +	/* We are running, enable runtime PM for the CPU. */
> >>> +	cpu_dev = get_cpu_device(cpu);
> >>> +	if (cpu_dev)
> >>> +		pm_runtime_get_sync(cpu_dev);
> >>> +
> >>
> >>Please no.  This is fragile.
> >>
> >>pm_runtime_get_sync() can sleep, and this path is used when the system
> >>is fully up and running.  Locks may be held, which cause this to sleep.
> >>However, we're calling it from a context where we can't sleep - which
> >>is the general rule for any part of system initialisation where we
> >>have not entered the idle thread yet (the idle thread is what we run
> >>when sleeping.)
> >>
> More explanation below.
> 
> Another patch (3/7) in this series defines CPU devices as IRQ safe and
> therefore the dev->power.lock would be spinlock'd before calling the
> rest of the PM runtime code and the domain. The CPU PM domain would also
> be an IRQ safe domain (patch 2/7 makes the genpd framework usable in IRQ
> safe context).
> 
> >>NAK on this.
> >
> >There's another issue here.  This is obviously wrong.  If the CPU is
> >inside the PM domain, then don't you need the PM domain powered up for
> >the CPU to start executing instructions?
> >
> This would not be the case for CPU PM domains. The CPU PM domains AFAIK,
> are triggered by the same wake up interrupt that brings the CPU out of
> its power downs state. The domain hardware has to be awake before the
> CPU executes its first instruction. Thefore the hw power up is a logical
> OR of the wakeup signals to any of the cores.

You're taking the behaviour of the hardware you have in front of you
and claiming that it's true everywhere, and shoving that into generic
code.

I know, for example on OMAP, you have to power up the CPU first before
you can "wake" it.

I wouldn't be surprised if other SoCs are like that: where they require
the CPU core to be powered and held in reset, before releasing the reset
to then allow them to start executing the secondary core bringup.

Relying on hardware to do this sounds really fragile and bad design to
me.

If you want to persue your current design, don't make it generic code,
because it's got no right to be generic with assumptions like that.
Lina Iyer Sept. 4, 2015, 5:02 p.m. UTC | #8
On Fri, Sep 04 2015 at 10:23 -0600, Russell King - ARM Linux wrote:
>On Fri, Sep 04, 2015 at 09:12:02AM -0600, Lina Iyer wrote:
>> On Fri, Sep 04 2015 at 03:27 -0600, Russell King - ARM Linux wrote:
>> >On Fri, Sep 04, 2015 at 10:17:35AM +0100, Russell King - ARM Linux wrote:
>> >>On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
>> >>> @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
>> >>>  	local_irq_enable();
>> >>>  	local_fiq_enable();
>> >>>
>> >>> +	/* We are running, enable runtime PM for the CPU. */
>> >>> +	cpu_dev = get_cpu_device(cpu);
>> >>> +	if (cpu_dev)
>> >>> +		pm_runtime_get_sync(cpu_dev);
>> >>> +
>> >>
>> >>Please no.  This is fragile.
>> >>
>> >>pm_runtime_get_sync() can sleep, and this path is used when the system
>> >>is fully up and running.  Locks may be held, which cause this to sleep.
>> >>However, we're calling it from a context where we can't sleep - which
>> >>is the general rule for any part of system initialisation where we
>> >>have not entered the idle thread yet (the idle thread is what we run
>> >>when sleeping.)
>> >>
>> More explanation below.
>>
>> Another patch (3/7) in this series defines CPU devices as IRQ safe and
>> therefore the dev->power.lock would be spinlock'd before calling the
>> rest of the PM runtime code and the domain. The CPU PM domain would also
>> be an IRQ safe domain (patch 2/7 makes the genpd framework usable in IRQ
>> safe context).
>>
>> >>NAK on this.
>> >
>> >There's another issue here.  This is obviously wrong.  If the CPU is
>> >inside the PM domain, then don't you need the PM domain powered up for
>> >the CPU to start executing instructions?
>> >
>> This would not be the case for CPU PM domains. The CPU PM domains AFAIK,
>> are triggered by the same wake up interrupt that brings the CPU out of
>> its power downs state. The domain hardware has to be awake before the
>> CPU executes its first instruction. Thefore the hw power up is a logical
>> OR of the wakeup signals to any of the cores.
>
>You're taking the behaviour of the hardware you have in front of you
>and claiming that it's true everywhere, and shoving that into generic
>code.
>
>I know, for example on OMAP, you have to power up the CPU first before
>you can "wake" it.
>
>I wouldn't be surprised if other SoCs are like that: where they require
>the CPU core to be powered and held in reset, before releasing the reset
>to then allow them to start executing the secondary core bringup.
>
It would still require that the CPU's domain be powered on in the hw,
before the CPU can run Linux.

>Relying on hardware to do this sounds really fragile and bad design to
>me.
>
>If you want to persue your current design, don't make it generic code,
>because it's got no right to be generic with assumptions like that.
>
Hmm, okay. I can look at alternatives like  hotplug notifiers.

-- Lina
Russell King - ARM Linux Sept. 4, 2015, 5:46 p.m. UTC | #9
On Fri, Sep 04, 2015 at 11:02:11AM -0600, Lina Iyer wrote:
> On Fri, Sep 04 2015 at 10:23 -0600, Russell King - ARM Linux wrote:
> >You're taking the behaviour of the hardware you have in front of you
> >and claiming that it's true everywhere, and shoving that into generic
> >code.
> >
> >I know, for example on OMAP, you have to power up the CPU first before
> >you can "wake" it.
> >
> >I wouldn't be surprised if other SoCs are like that: where they require
> >the CPU core to be powered and held in reset, before releasing the reset
> >to then allow them to start executing the secondary core bringup.
> >
> It would still require that the CPU's domain be powered on in the hw,
> before the CPU can run Linux.
> 
> >Relying on hardware to do this sounds really fragile and bad design to
> >me.
> >
> >If you want to persue your current design, don't make it generic code,
> >because it's got no right to be generic with assumptions like that.
> >
> Hmm, okay. I can look at alternatives like  hotplug notifiers.

How about putting the pm resume in path of the _requesting_ CPU?
IOW, in __cpu_up().
Grygorii Strashko Sept. 4, 2015, 5:57 p.m. UTC | #10
Hi Lina,

On 09/04/2015 06:12 PM, Lina Iyer wrote:
> On Fri, Sep 04 2015 at 03:27 -0600, Russell King - ARM Linux wrote:
>> On Fri, Sep 04, 2015 at 10:17:35AM +0100, Russell King - ARM Linux wrote:
>>> On Thu, Sep 03, 2015 at 01:58:34PM -0600, Lina Iyer wrote:
>>> > @@ -401,6 +412,11 @@ asmlinkage void secondary_start_kernel(void)
>>> >      local_irq_enable();
>>> >      local_fiq_enable();
>>> >
>>> > +    /* We are running, enable runtime PM for the CPU. */
>>> > +    cpu_dev = get_cpu_device(cpu);
>>> > +    if (cpu_dev)
>>> > +        pm_runtime_get_sync(cpu_dev);
>>> > +
>>>
>>> Please no.  This is fragile.
>>>
>>> pm_runtime_get_sync() can sleep, and this path is used when the system
>>> is fully up and running.  Locks may be held, which cause this to sleep.
>>> However, we're calling it from a context where we can't sleep - which
>>> is the general rule for any part of system initialisation where we
>>> have not entered the idle thread yet (the idle thread is what we run
>>> when sleeping.)
>>>
> More explanation below.
> 
> Another patch (3/7) in this series defines CPU devices as IRQ safe and
> therefore the dev->power.lock would be spinlock'd before calling the
> rest of the PM runtime code and the domain. The CPU PM domain would also
> be an IRQ safe domain (patch 2/7 makes the genpd framework usable in IRQ
> safe context).

There is one "small" problem with such approach :(

- It's incompatible with -RT kernel, because PM runtime can't be used
in atomic context on -RT. As result, CPU's hotplug will be broken
(CPUIdle will be broken also, but CPU hotplug is more important at least for me:).

Also, as for me, PM runtime + genpd is a little bit heavy-weight things
to be used for CPUIdle and It could affect on states with small wake-up latencies.

[...]
Alan Stern Sept. 4, 2015, 6:45 p.m. UTC | #11
On Fri, 4 Sep 2015, Grygorii Strashko wrote:

> There is one "small" problem with such approach :(
> 
> - It's incompatible with -RT kernel, because PM runtime can't be used
> in atomic context on -RT.

Can you explain this more fully?  Why can't runtime PM be used in 
atomic context in the -rt kernels?

Alan Stern
Grygorii Strashko Sept. 4, 2015, 9:46 p.m. UTC | #12
On 09/04/2015 09:45 PM, Alan Stern wrote:
> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
> 
>> There is one "small" problem with such approach :(
>>
>> - It's incompatible with -RT kernel, because PM runtime can't be used
>> in atomic context on -RT.
> 
> Can you explain this more fully?  Why can't runtime PM be used in
> atomic context in the -rt kernels?
>  

See:
 http://lwn.net/Articles/146861/
 https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F

spinlock_t
    Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
 do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
 inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT. 

As result, have to do things like:
https://lkml.org/lkml/2015/8/18/161
https://lkml.org/lkml/2015/8/18/162

Sorry for brief reply - Friday/Sat night :)
Alan Stern Sept. 5, 2015, 3:39 p.m. UTC | #13
On Sat, 5 Sep 2015, Grygorii Strashko wrote:

> On 09/04/2015 09:45 PM, Alan Stern wrote:
> > On Fri, 4 Sep 2015, Grygorii Strashko wrote:
> > 
> >> There is one "small" problem with such approach :(
> >>
> >> - It's incompatible with -RT kernel, because PM runtime can't be used
> >> in atomic context on -RT.
> > 
> > Can you explain this more fully?  Why can't runtime PM be used in
> > atomic context in the -rt kernels?
> >  
> 
> See:
>  http://lwn.net/Articles/146861/
>  https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
> 
> spinlock_t
>     Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>  do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>  inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT. 
> 
> As result, have to do things like:
> https://lkml.org/lkml/2015/8/18/161
> https://lkml.org/lkml/2015/8/18/162
> 
> Sorry for brief reply - Friday/Sat night :)

I see.  Although we normally think of interrupt contexts as being
atomic, in an -rt kernel this isn't true any more because things like
spin_lock_irq don't actually disable interrupts.

Therefore it would be correct to say that in -rt kernels, runtime PM
can be used in interrupt context (if the device is marked as irq-safe),
but not in atomic context.  Right?

Alan Stern
Rafael J. Wysocki Sept. 7, 2015, 1:04 p.m. UTC | #14
On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
> 
> > On 09/04/2015 09:45 PM, Alan Stern wrote:
> > > On Fri, 4 Sep 2015, Grygorii Strashko wrote:
> > > 
> > >> There is one "small" problem with such approach :(
> > >>
> > >> - It's incompatible with -RT kernel, because PM runtime can't be used
> > >> in atomic context on -RT.
> > > 
> > > Can you explain this more fully?  Why can't runtime PM be used in
> > > atomic context in the -rt kernels?
> > >  
> > 
> > See:
> >  http://lwn.net/Articles/146861/
> >  https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
> > 
> > spinlock_t
> >     Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
> >  do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
> >  inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT. 
> > 
> > As result, have to do things like:
> > https://lkml.org/lkml/2015/8/18/161
> > https://lkml.org/lkml/2015/8/18/162
> > 
> > Sorry for brief reply - Friday/Sat night :)
> 
> I see.  Although we normally think of interrupt contexts as being
> atomic, in an -rt kernel this isn't true any more because things like
> spin_lock_irq don't actually disable interrupts.
> 
> Therefore it would be correct to say that in -rt kernels, runtime PM
> can be used in interrupt context (if the device is marked as irq-safe),
> but not in atomic context.  Right?

Right.

Whatever is suitable for interrupt context in the mainline, will be suitable
for that in -rt kernels too.  However, what is suitable for the idle loop
in the mainline, may not be suitable for that in -rt kernels.

That's why raw_spin_lock/unlock() need to be used within the idle loop.

Thanks,
Rafael
Grygorii Strashko Sept. 7, 2015, 1:37 p.m. UTC | #15
On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
> On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
>> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
>>
>>> On 09/04/2015 09:45 PM, Alan Stern wrote:
>>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
>>>>
>>>>> There is one "small" problem with such approach :(
>>>>>
>>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
>>>>> in atomic context on -RT.
>>>>
>>>> Can you explain this more fully?  Why can't runtime PM be used in
>>>> atomic context in the -rt kernels?
>>>>   
>>>
>>> See:
>>>   http://lwn.net/Articles/146861/
>>>   https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
>>>
>>> spinlock_t
>>>      Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>>>   do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>>>   inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
>>>
>>> As result, have to do things like:
>>> https://lkml.org/lkml/2015/8/18/161
>>> https://lkml.org/lkml/2015/8/18/162
>>>
>>> Sorry for brief reply - Friday/Sat night :)
>>
>> I see.  Although we normally think of interrupt contexts as being
>> atomic, in an -rt kernel this isn't true any more because things like
>> spin_lock_irq don't actually disable interrupts.
>>
>> Therefore it would be correct to say that in -rt kernels, runtime PM
>> can be used in interrupt context (if the device is marked as irq-safe),
>> but not in atomic context.  Right?
> 
> Right.
> 
> Whatever is suitable for interrupt context in the mainline, will be suitable
> for that in -rt kernels too. 

Not exactly true :(, since spinlock is converted to [rt_] mutex.
Usually, this difference can't be seen because on -RT kernel all or
mostly all HW IRQ handlers will be forced to be threaded.
For the cases, where such automatic conversion is not working,
(like chained irq handlers or HW-handler+Threaded handler) the code
has to be carefully patched to work properly as for non-RT as for -RT.

Also, this triggers some -RT incompatibility issues, like with PM runtime or
CLK framework (http://www.spinics.net/lists/linux-rt-users/msg13653.html).

> However, what is suitable for the idle loop
> in the mainline, may not be suitable for that in -rt kernels.
> 
> That's why raw_spin_lock/unlock() need to be used within the idle loop.

Indeed. CPU hotplug/CPUIdle is guarded by local_irq_disable()/local_irq_enable().
(example of CPU hotplug RT-issue http://www.spinics.net/lists/arm-kernel/msg438963.html).

I don't want to be the final authority here as my experience with -RT is short.
But, I want to point out on potential issues based on what I've already discovered
and tried to fix.

Thanks & regards,
-grygorii
Rafael J. Wysocki Sept. 7, 2015, 8:42 p.m. UTC | #16
On Monday, September 07, 2015 04:37:44 PM Grygorii Strashko wrote:
> On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
> > On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
> >> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
> >>
> >>> On 09/04/2015 09:45 PM, Alan Stern wrote:
> >>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
> >>>>
> >>>>> There is one "small" problem with such approach :(
> >>>>>
> >>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
> >>>>> in atomic context on -RT.
> >>>>
> >>>> Can you explain this more fully?  Why can't runtime PM be used in
> >>>> atomic context in the -rt kernels?
> >>>>   
> >>>
> >>> See:
> >>>   http://lwn.net/Articles/146861/
> >>>   https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
> >>>
> >>> spinlock_t
> >>>      Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
> >>>   do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
> >>>   inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
> >>>
> >>> As result, have to do things like:
> >>> https://lkml.org/lkml/2015/8/18/161
> >>> https://lkml.org/lkml/2015/8/18/162
> >>>
> >>> Sorry for brief reply - Friday/Sat night :)
> >>
> >> I see.  Although we normally think of interrupt contexts as being
> >> atomic, in an -rt kernel this isn't true any more because things like
> >> spin_lock_irq don't actually disable interrupts.
> >>
> >> Therefore it would be correct to say that in -rt kernels, runtime PM
> >> can be used in interrupt context (if the device is marked as irq-safe),
> >> but not in atomic context.  Right?
> > 
> > Right.
> > 
> > Whatever is suitable for interrupt context in the mainline, will be suitable
> > for that in -rt kernels too. 
> 
> Not exactly true :(, since spinlock is converted to [rt_] mutex.
> Usually, this difference can't be seen because on -RT kernel all or
> mostly all HW IRQ handlers will be forced to be threaded.

Exactly.  And that's what I'm talking about.

> For the cases, where such automatic conversion is not working,
> (like chained irq handlers or HW-handler+Threaded handler) the code
> has to be carefully patched to work properly as for non-RT as for -RT.

Right.

> Also, this triggers some -RT incompatibility issues, like with PM runtime or

That I'm not sure about.  Why would runtime PM cause problems with -RT (apart
from attempts to use it from the idle loop, but that's not happening in the
mainline anyway)?

> CLK framework (http://www.spinics.net/lists/linux-rt-users/msg13653.html).
>
> > However, what is suitable for the idle loop
> > in the mainline, may not be suitable for that in -rt kernels.
> > 
> > That's why raw_spin_lock/unlock() need to be used within the idle loop.
> 
> Indeed. CPU hotplug/CPUIdle is guarded by local_irq_disable()/local_irq_enable().
> (example of CPU hotplug RT-issue http://www.spinics.net/lists/arm-kernel/msg438963.html).
> 
> I don't want to be the final authority here as my experience with -RT is short.
> But, I want to point out on potential issues based on what I've already discovered
> and tried to fix.

OK

Thanks,
Rafael
Grygorii Strashko Sept. 8, 2015, 8:21 a.m. UTC | #17
On 09/07/2015 11:42 PM, Rafael J. Wysocki wrote:
> On Monday, September 07, 2015 04:37:44 PM Grygorii Strashko wrote:
>> On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
>>> On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
>>>> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
>>>>
>>>>> On 09/04/2015 09:45 PM, Alan Stern wrote:
>>>>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
>>>>>>
>>>>>>> There is one "small" problem with such approach :(
>>>>>>>
>>>>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
>>>>>>> in atomic context on -RT.
>>>>>>
>>>>>> Can you explain this more fully?  Why can't runtime PM be used in
>>>>>> atomic context in the -rt kernels?
>>>>>>    
>>>>>
>>>>> See:
>>>>>    http://lwn.net/Articles/146861/
>>>>>    https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
>>>>>
>>>>> spinlock_t
>>>>>       Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>>>>>    do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>>>>>    inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
>>>>>
>>>>> As result, have to do things like:
>>>>> https://lkml.org/lkml/2015/8/18/161
>>>>> https://lkml.org/lkml/2015/8/18/162
>>>>>
>>>>> Sorry for brief reply - Friday/Sat night :)
>>>>
>>>> I see.  Although we normally think of interrupt contexts as being
>>>> atomic, in an -rt kernel this isn't true any more because things like
>>>> spin_lock_irq don't actually disable interrupts.
>>>>
>>>> Therefore it would be correct to say that in -rt kernels, runtime PM
>>>> can be used in interrupt context (if the device is marked as irq-safe),
>>>> but not in atomic context.  Right?
>>>
>>> Right.
>>>
>>> Whatever is suitable for interrupt context in the mainline, will be suitable
>>> for that in -rt kernels too.
>>
>> Not exactly true :(, since spinlock is converted to [rt_] mutex.
>> Usually, this difference can't be seen because on -RT kernel all or
>> mostly all HW IRQ handlers will be forced to be threaded.
> 
> Exactly.  And that's what I'm talking about.
> 
>> For the cases, where such automatic conversion is not working,
>> (like chained irq handlers or HW-handler+Threaded handler) the code
>> has to be carefully patched to work properly as for non-RT as for -RT.
> 
> Right.
> 
>> Also, this triggers some -RT incompatibility issues, like with PM runtime or
> 
> That I'm not sure about.  Why would runtime PM cause problems with -RT (apart
> from attempts to use it from the idle loop, but that's not happening in the
> mainline anyway)?


I have to be more specific - sorry. "irq_safe" mode of PM runtime is incompatible with -RT.

Here is an example:
- HW IRQ handler in TI OMAP GPIO driver is implemented as chained IRQ handler and
  contains pm_runtime_get_sync()/pm_runtime_put(). This works properly with vanilla kernel
  because OMAP GPIO devices marked as irq_safe.
  Chained IRQ handlers can't be forced threaded and PM runtime APIs trigger
 "sleeping function called from invalid context" issues there, so corresponding code has to be reworked.


...
Kevin Hilman Sept. 8, 2015, 10:03 p.m. UTC | #18
Grygorii Strashko <grygorii.strashko@ti.com> writes:

> On 09/07/2015 11:42 PM, Rafael J. Wysocki wrote:
>> On Monday, September 07, 2015 04:37:44 PM Grygorii Strashko wrote:
>>> On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
>>>> On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
>>>>> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
>>>>>
>>>>>> On 09/04/2015 09:45 PM, Alan Stern wrote:
>>>>>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
>>>>>>>
>>>>>>>> There is one "small" problem with such approach :(
>>>>>>>>
>>>>>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
>>>>>>>> in atomic context on -RT.
>>>>>>>
>>>>>>> Can you explain this more fully?  Why can't runtime PM be used in
>>>>>>> atomic context in the -rt kernels?
>>>>>>>    
>>>>>>
>>>>>> See:
>>>>>>    http://lwn.net/Articles/146861/
>>>>>>    https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
>>>>>>
>>>>>> spinlock_t
>>>>>>       Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>>>>>>    do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>>>>>>    inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
>>>>>>
>>>>>> As result, have to do things like:
>>>>>> https://lkml.org/lkml/2015/8/18/161
>>>>>> https://lkml.org/lkml/2015/8/18/162
>>>>>>
>>>>>> Sorry for brief reply - Friday/Sat night :)
>>>>>
>>>>> I see.  Although we normally think of interrupt contexts as being
>>>>> atomic, in an -rt kernel this isn't true any more because things like
>>>>> spin_lock_irq don't actually disable interrupts.
>>>>>
>>>>> Therefore it would be correct to say that in -rt kernels, runtime PM
>>>>> can be used in interrupt context (if the device is marked as irq-safe),
>>>>> but not in atomic context.  Right?
>>>>
>>>> Right.
>>>>
>>>> Whatever is suitable for interrupt context in the mainline, will be suitable
>>>> for that in -rt kernels too.
>>>
>>> Not exactly true :(, since spinlock is converted to [rt_] mutex.
>>> Usually, this difference can't be seen because on -RT kernel all or
>>> mostly all HW IRQ handlers will be forced to be threaded.
>> 
>> Exactly.  And that's what I'm talking about.
>> 
>>> For the cases, where such automatic conversion is not working,
>>> (like chained irq handlers or HW-handler+Threaded handler) the code
>>> has to be carefully patched to work properly as for non-RT as for -RT.
>> 
>> Right.
>> 
>>> Also, this triggers some -RT incompatibility issues, like with PM runtime or
>> 
>> That I'm not sure about.  Why would runtime PM cause problems with -RT (apart
>> from attempts to use it from the idle loop, but that's not happening in the
>> mainline anyway)?
>
>
> I have to be more specific - sorry. "irq_safe" mode of PM runtime is incompatible with -RT.
>
> Here is an example:
> - HW IRQ handler in TI OMAP GPIO driver is implemented as chained IRQ handler and
>   contains pm_runtime_get_sync()/pm_runtime_put(). This works properly with vanilla kernel
>   because OMAP GPIO devices marked as irq_safe.
>   Chained IRQ handlers can't be forced threaded and PM runtime APIs trigger
>  "sleeping function called from invalid context" issues there, so corresponding code has to be reworked.

Isn't that why those are being converted to raw_*[1] ?

Kevin

[1] http://marc.info/?l=linux-omap&m=143749603401221&w=2
Grygorii Strashko Sept. 10, 2015, 11:01 a.m. UTC | #19
On 09/09/2015 01:03 AM, Kevin Hilman wrote:
> Grygorii Strashko <grygorii.strashko@ti.com> writes:
> 
>> On 09/07/2015 11:42 PM, Rafael J. Wysocki wrote:
>>> On Monday, September 07, 2015 04:37:44 PM Grygorii Strashko wrote:
>>>> On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
>>>>> On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
>>>>>> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
>>>>>>
>>>>>>> On 09/04/2015 09:45 PM, Alan Stern wrote:
>>>>>>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
>>>>>>>>
>>>>>>>>> There is one "small" problem with such approach :(
>>>>>>>>>
>>>>>>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
>>>>>>>>> in atomic context on -RT.
>>>>>>>>
>>>>>>>> Can you explain this more fully?  Why can't runtime PM be used in
>>>>>>>> atomic context in the -rt kernels?
>>>>>>>>     
>>>>>>>
>>>>>>> See:
>>>>>>>     http://lwn.net/Articles/146861/
>>>>>>>     https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
>>>>>>>
>>>>>>> spinlock_t
>>>>>>>        Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>>>>>>>     do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>>>>>>>     inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
>>>>>>>
>>>>>>> As result, have to do things like:
>>>>>>> https://lkml.org/lkml/2015/8/18/161
>>>>>>> https://lkml.org/lkml/2015/8/18/162
>>>>>>>
>>>>>>> Sorry for brief reply - Friday/Sat night :)
>>>>>>
>>>>>> I see.  Although we normally think of interrupt contexts as being
>>>>>> atomic, in an -rt kernel this isn't true any more because things like
>>>>>> spin_lock_irq don't actually disable interrupts.
>>>>>>
>>>>>> Therefore it would be correct to say that in -rt kernels, runtime PM
>>>>>> can be used in interrupt context (if the device is marked as irq-safe),
>>>>>> but not in atomic context.  Right?
>>>>>
>>>>> Right.
>>>>>
>>>>> Whatever is suitable for interrupt context in the mainline, will be suitable
>>>>> for that in -rt kernels too.
>>>>
>>>> Not exactly true :(, since spinlock is converted to [rt_] mutex.
>>>> Usually, this difference can't be seen because on -RT kernel all or
>>>> mostly all HW IRQ handlers will be forced to be threaded.
>>>
>>> Exactly.  And that's what I'm talking about.
>>>
>>>> For the cases, where such automatic conversion is not working,
>>>> (like chained irq handlers or HW-handler+Threaded handler) the code
>>>> has to be carefully patched to work properly as for non-RT as for -RT.
>>>
>>> Right.
>>>
>>>> Also, this triggers some -RT incompatibility issues, like with PM runtime or
>>>
>>> That I'm not sure about.  Why would runtime PM cause problems with -RT (apart
>>> from attempts to use it from the idle loop, but that's not happening in the
>>> mainline anyway)?
>>
>>
>> I have to be more specific - sorry. "irq_safe" mode of PM runtime is incompatible with -RT.
>>
>> Here is an example:
>> - HW IRQ handler in TI OMAP GPIO driver is implemented as chained IRQ handler and
>>    contains pm_runtime_get_sync()/pm_runtime_put(). This works properly with vanilla kernel
>>    because OMAP GPIO devices marked as irq_safe.
>>    Chained IRQ handlers can't be forced threaded and PM runtime APIs trigger
>>   "sleeping function called from invalid context" issues there, so corresponding code has to be reworked.
> 
> Isn't that why those are being converted to raw_*[1] ?
> 
> Kevin
> 
> [1] http://marc.info/?l=linux-omap&m=143749603401221&w=2
> 

That's way I've tried to convert those to generic IRQ handler [2] :), 
so on -RT it will be forced threaded. 

raw_* is different kind of problem in gpio-omap - IRQ controllers
have to use raw_* inside irq_chip callbacks, because IRQ core guards those
callbacks using raw_* locks. 
.irq_bus_lock()/irq_bus_sync_unlock() callbacks can be used [3]
for any kind of operations which require non-atomic context.

[2] https://lkml.org/lkml/2015/8/18/162
gpio: omap: convert to use generic irq handler
[3] https://lkml.org/lkml/2015/8/18/161
gpio: omap: move pm runtime in irq_chip.irq_bus_lock/sync_unlock
Lina Iyer Sept. 22, 2015, 5:32 p.m. UTC | #20
On Thu, Sep 10 2015 at 04:01 -0700, Grygorii Strashko wrote:
>On 09/09/2015 01:03 AM, Kevin Hilman wrote:
>> Grygorii Strashko <grygorii.strashko@ti.com> writes:
>>
>>> On 09/07/2015 11:42 PM, Rafael J. Wysocki wrote:
>>>> On Monday, September 07, 2015 04:37:44 PM Grygorii Strashko wrote:
>>>>> On 09/07/2015 04:04 PM, Rafael J. Wysocki wrote:
>>>>>> On Saturday, September 05, 2015 11:39:20 AM Alan Stern wrote:
>>>>>>> On Sat, 5 Sep 2015, Grygorii Strashko wrote:
>>>>>>>
>>>>>>>> On 09/04/2015 09:45 PM, Alan Stern wrote:
>>>>>>>>> On Fri, 4 Sep 2015, Grygorii Strashko wrote:
>>>>>>>>>
>>>>>>>>>> There is one "small" problem with such approach :(
>>>>>>>>>>
>>>>>>>>>> - It's incompatible with -RT kernel, because PM runtime can't be used
>>>>>>>>>> in atomic context on -RT.
>>>>>>>>>
>>>>>>>>> Can you explain this more fully?  Why can't runtime PM be used in
>>>>>>>>> atomic context in the -rt kernels?
>>>>>>>>>
>>>>>>>>
>>>>>>>> See:
>>>>>>>>     http://lwn.net/Articles/146861/
>>>>>>>>     https://rt.wiki.kernel.org/index.php/Frequently_Asked_Questions#How_does_the_CONFIG_PREEMPT_RT_patch_work.3F
>>>>>>>>
>>>>>>>> spinlock_t
>>>>>>>>        Critical sections are preemptible. The _irq operations (e.g., spin_lock_irqsave())
>>>>>>>>     do -not- disable hardware interrupts. Priority inheritance is used to prevent priority
>>>>>>>>     inversion. An underlying rt_mutex is used to implement spinlock_t in PREEMPT_RT.
>>>>>>>>
>>>>>>>> As result, have to do things like:
>>>>>>>> https://lkml.org/lkml/2015/8/18/161
>>>>>>>> https://lkml.org/lkml/2015/8/18/162
>>>>>>>>
>>>>>>>> Sorry for brief reply - Friday/Sat night :)
>>>>>>>
>>>>>>> I see.  Although we normally think of interrupt contexts as being
>>>>>>> atomic, in an -rt kernel this isn't true any more because things like
>>>>>>> spin_lock_irq don't actually disable interrupts.
>>>>>>>
>>>>>>> Therefore it would be correct to say that in -rt kernels, runtime PM
>>>>>>> can be used in interrupt context (if the device is marked as irq-safe),
>>>>>>> but not in atomic context.  Right?
>>>>>>
>>>>>> Right.
>>>>>>
>>>>>> Whatever is suitable for interrupt context in the mainline, will be suitable
>>>>>> for that in -rt kernels too.
>>>>>
>>>>> Not exactly true :(, since spinlock is converted to [rt_] mutex.
>>>>> Usually, this difference can't be seen because on -RT kernel all or
>>>>> mostly all HW IRQ handlers will be forced to be threaded.
>>>>
>>>> Exactly.  And that's what I'm talking about.
>>>>
>>>>> For the cases, where such automatic conversion is not working,
>>>>> (like chained irq handlers or HW-handler+Threaded handler) the code
>>>>> has to be carefully patched to work properly as for non-RT as for -RT.
>>>>
>>>> Right.
>>>>
>>>>> Also, this triggers some -RT incompatibility issues, like with PM runtime or
>>>>
>>>> That I'm not sure about.  Why would runtime PM cause problems with -RT (apart
>>>> from attempts to use it from the idle loop, but that's not happening in the
>>>> mainline anyway)?
>>>
>>>
>>> I have to be more specific - sorry. "irq_safe" mode of PM runtime is incompatible with -RT.
>>>
>>> Here is an example:
>>> - HW IRQ handler in TI OMAP GPIO driver is implemented as chained IRQ handler and
>>>    contains pm_runtime_get_sync()/pm_runtime_put(). This works properly with vanilla kernel
>>>    because OMAP GPIO devices marked as irq_safe.
>>>    Chained IRQ handlers can't be forced threaded and PM runtime APIs trigger
>>>   "sleeping function called from invalid context" issues there, so corresponding code has to be reworked.
>>
>> Isn't that why those are being converted to raw_*[1] ?
>>
>> Kevin
>>
>> [1] http://marc.info/?l=linux-omap&m=143749603401221&w=2
>>
>
>That's way I've tried to convert those to generic IRQ handler [2] :),
>so on -RT it will be forced threaded.
>
>raw_* is different kind of problem in gpio-omap - IRQ controllers
>have to use raw_* inside irq_chip callbacks, because IRQ core guards those
>callbacks using raw_* locks.
>.irq_bus_lock()/irq_bus_sync_unlock() callbacks can be used [3]
>for any kind of operations which require non-atomic context.
>
Talking to John Stultz at Linaro Connect: Is cpuidle relevant in
-RT kernel? I dont know much about -RT. I thought this might be point
that we should consider.

As it stands today, on a 800 Mhz ARM quad core, I am seeing a latency of
50-70 usec for the additional runtime PM in the cpuidle. Ofcourse, there
is a definite need to optimize and there are opportunities to do that.

Since each CPU does its own runtime PM, we could probably avoid any kind
of locks in the runtime PM. But locks in genpd may be unavoidable. Will
look more into that.

Thanks,
Lina

>[2] https://lkml.org/lkml/2015/8/18/162
>gpio: omap: convert to use generic irq handler
>[3] https://lkml.org/lkml/2015/8/18/161
>gpio: omap: move pm runtime in irq_chip.irq_bus_lock/sync_unlock
>
>
>
>-- 
>regards,
>-grygorii
Thomas Gleixner Sept. 22, 2015, 8:53 p.m. UTC | #21
On Tue, 22 Sep 2015, Lina Iyer wrote:
> Talking to John Stultz at Linaro Connect: Is cpuidle relevant in
> -RT kernel? I dont know much about -RT. I thought this might be point
> that we should consider.

There are definitely RT systems out there where power consumption is a
concern.
 
> As it stands today, on a 800 Mhz ARM quad core, I am seeing a latency of
> 50-70 usec for the additional runtime PM in the cpuidle. Ofcourse, there
> is a definite need to optimize and there are opportunities to do that.

RT is not about being fast. RT is about being deterministic. So if a
power sensitive RT system can afford the extra latencies induced by PM
it will tolerate them. Of course, optimizing that is not a bad thing
at all :)
 
Thanks,

	tglx
diff mbox

Patch

diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index 0496b48..077c55f 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -27,6 +27,7 @@ 
 #include <linux/completion.h>
 #include <linux/cpufreq.h>
 #include <linux/irq_work.h>
+#include <linux/pm_runtime.h>
 
 #include <linux/atomic.h>
 #include <asm/smp.h>
@@ -137,7 +138,6 @@  int __cpu_up(unsigned int cpu, struct task_struct *idle)
 		pr_err("CPU%u: failed to boot: %d\n", cpu, ret);
 	}
 
-
 	memset(&secondary_data, 0, sizeof(secondary_data));
 	return ret;
 }
@@ -271,6 +271,16 @@  void __cpu_die(unsigned int cpu)
 void __ref cpu_die(void)
 {
 	unsigned int cpu = smp_processor_id();
+	struct device *cpu_dev;
+
+	/*
+	 * We dont need the CPU device anymore.
+	 * Lets do this before IRQs are disabled to allow
+	 * runtime PM to suspend the domain as well.
+	 */
+	cpu_dev = get_cpu_device(cpu);
+	if (cpu_dev)
+		pm_runtime_put_sync(cpu_dev);
 
 	idle_task_exit();
 
@@ -352,6 +362,7 @@  asmlinkage void secondary_start_kernel(void)
 {
 	struct mm_struct *mm = &init_mm;
 	unsigned int cpu;
+	struct device *cpu_dev;
 
 	/*
 	 * The identity mapping is uncached (strongly ordered), so
@@ -401,6 +412,11 @@  asmlinkage void secondary_start_kernel(void)
 	local_irq_enable();
 	local_fiq_enable();
 
+	/* We are running, enable runtime PM for the CPU. */
+	cpu_dev = get_cpu_device(cpu);
+	if (cpu_dev)
+		pm_runtime_get_sync(cpu_dev);
+
 	/*
 	 * OK, it's off to the idle thread for us
 	 */