diff mbox

[3/5] ARM: MCPM: make internal helpers private to the core code

Message ID 1430496392-15956-4-git-send-email-nicolas.pitre@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

Nicolas Pitre May 1, 2015, 4:06 p.m. UTC
This concerns the following helpers:

	__mcpm_cpu_going_down()
	__mcpm_cpu_down()
	__mcpm_outbound_enter_critical()
	__mcpm_outbound_leave_critical()
	__mcpm_cluster_state()

They are and should only be used by the core code now.  Therefore their
declarations are removed from mcpm.h and their definitions are made
static, hence the need to move them before their users which accounts
for the bulk of this patch.

This left the mcpm_sync_struct definition at an odd location, therefore
it is moved as well with some comment clarifications.

Signed-off-by: Nicolas Pitre <nico@linaro.org>
---
 arch/arm/common/mcpm_entry.c | 229 ++++++++++++++++++++++---------------------
 arch/arm/include/asm/mcpm.h  |  52 +++++-----
 2 files changed, 138 insertions(+), 143 deletions(-)

Comments

Chen-Yu Tsai July 24, 2015, 3:54 a.m. UTC | #1
Hi,

On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> This concerns the following helpers:
>
>         __mcpm_cpu_going_down()
>         __mcpm_cpu_down()
>         __mcpm_outbound_enter_critical()
>         __mcpm_outbound_leave_critical()
>         __mcpm_cluster_state()
>
> They are and should only be used by the core code now.  Therefore their
> declarations are removed from mcpm.h and their definitions are made
> static, hence the need to move them before their users which accounts
> for the bulk of this patch.

I'm looking for some advice. On the Allwinner A80, at least on mainline,
there is no external PMU or embedded controller in charge of power
controls. What this means is that I'm doing power sequencing in the
kernel as part of the MCPM calls, specifically powering down cores and
clusters in the .wait_for_powerdown callback. (I don't think it's
reasonable or even possible to power down stuff in .*_powerdown_prepare)

Previously I was using __mcpm_cluster_state() to check if the last core
in a cluster was to be powered off, and thus the whole cluster could be
turned off as well. I could also check if the individual power gates or
resets are asserted, but if a core was already scheduled to be brought
up, and MCPM common framework didn't call .cluster_powerup, there might
be a problem.

Any suggestions? Maybe export __mcpm_cluster_state() so platform code
can know what's going to happen?

Thanks


Regards
ChenYu

> This left the mcpm_sync_struct definition at an odd location, therefore
> it is moved as well with some comment clarifications.
>
> Signed-off-by: Nicolas Pitre <nico@linaro.org>
Dave Martin July 24, 2015, 11:15 a.m. UTC | #2
On Fri, Jul 24, 2015 at 11:54:18AM +0800, Chen-Yu Tsai wrote:
> Hi,
> 
> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > This concerns the following helpers:
> >
> >         __mcpm_cpu_going_down()
> >         __mcpm_cpu_down()
> >         __mcpm_outbound_enter_critical()
> >         __mcpm_outbound_leave_critical()
> >         __mcpm_cluster_state()
> >
> > They are and should only be used by the core code now.  Therefore their
> > declarations are removed from mcpm.h and their definitions are made
> > static, hence the need to move them before their users which accounts
> > for the bulk of this patch.
> 
> I'm looking for some advice. On the Allwinner A80, at least on mainline,
> there is no external PMU or embedded controller in charge of power
> controls. What this means is that I'm doing power sequencing in the
> kernel as part of the MCPM calls, specifically powering down cores and
> clusters in the .wait_for_powerdown callback. (I don't think it's
> reasonable or even possible to power down stuff in .*_powerdown_prepare)
> 
> Previously I was using __mcpm_cluster_state() to check if the last core
> in a cluster was to be powered off, and thus the whole cluster could be
> turned off as well. I could also check if the individual power gates or
> resets are asserted, but if a core was already scheduled to be brought
> up, and MCPM common framework didn't call .cluster_powerup, there might
> be a problem.

It's been a while since I looked at this stuff in detail, so Nico
may want to put me right ... but here's my 2 cents:


__mcpm_cluster_state() should be considered an implementation detail of
mcpm.  If you feel you need to call it from your driver code, that's
a warning that your code is probably racy -- unless you can explain
otherwise.

When you say you have no external power controller, does this mean
that you need to poll in Linux for the affected cpus/clusters to reach
WFI and then hit the relevant clamps, clocks and/or regulators
yourself?

If so you're effectively implementing a multithreaded power controller
in software, suggesting that you need some state tracking and locking
between the various mcpm methods in your code.  However, you can use
the normal locking and atomics APIs for this.

Since the MCPM lock is held when calling all the relevant methods
except for wait_for_powerdown(), this nearly satisfies the locking
requirement by itself.  You are correct that the hardware operations
associated with physically powering things down will need to be
done in wait_for_powerdown() though, and the MCPM lock is not held
when calling that.


This would motivate two solutions that I can see:

 a) Expose the mcpm lock so that driver code can lock/unlock it
    during critical regions in wait_for_powerdown().

 b) Refactor wait_for_powerdown() so that the locking, sleeping
    and timeout code is moved into the mcpm generic code, and
    the wait_for_powerdown() method is replaced with something
    like "cpu_is_powered_down()".


Further to this, this "software power controller" really just maps one
power sequencing model onto another, so there's a chance it could be
a generic mcpm driver than can be reused across multiple platforms.

Cheers
---Dave
Nicolas Pitre July 24, 2015, 3:44 p.m. UTC | #3
On Fri, 24 Jul 2015, Chen-Yu Tsai wrote:

> Hi,
> 
> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > This concerns the following helpers:
> >
> >         __mcpm_cpu_going_down()
> >         __mcpm_cpu_down()
> >         __mcpm_outbound_enter_critical()
> >         __mcpm_outbound_leave_critical()
> >         __mcpm_cluster_state()
> >
> > They are and should only be used by the core code now.  Therefore their
> > declarations are removed from mcpm.h and their definitions are made
> > static, hence the need to move them before their users which accounts
> > for the bulk of this patch.
> 
> I'm looking for some advice. On the Allwinner A80, at least on mainline,
> there is no external PMU or embedded controller in charge of power
> controls. What this means is that I'm doing power sequencing in the
> kernel as part of the MCPM calls, specifically powering down cores and
> clusters in the .wait_for_powerdown callback. (I don't think it's
> reasonable or even possible to power down stuff in .*_powerdown_prepare)

Can you tell me more about the power control knobs at your disposal?  Do 
power gates become effective immediately or only when WFI is asserted? 

And can you configure things so a core may be powered up asynchronously 
from an IRQ?

> Previously I was using __mcpm_cluster_state() to check if the last core
> in a cluster was to be powered off, and thus the whole cluster could be
> turned off as well.
> I could also check if the individual power gates or
> resets are asserted, but if a core was already scheduled to be brought
> up, and MCPM common framework didn't call .cluster_powerup, there might
> be a problem.

I fail to see how a core could be scheduled to be brought up without 
deasserting its reset line somehow though.

> Any suggestions? Maybe export __mcpm_cluster_state() so platform code
> can know what's going to happen?

The cluster state may change unexpectedly.  There is a special locking 
sequence and state machine needed to make this information reliable.  
Simply returning the current state wouldn't be enough to ensure 
it can be used race free.

As Dave stated, we might have to supplement the MCPM core code with 
special methods involving a surviving CPU to perform the power-down 
operation on the dying CPU's behalf.  Doing this in .wait_for_powerdown 
is just an abuse of the API.

It also brings up the question if MCPM is actually necessary in that 
case or if you can do without its complexity.  For example, you may look 
at commit 905cdf9dda5d for such a case.  It mainly depends on whether or 
not cores (and the cluster) may be awaken asynchronously upon assertion 
of an IRQ in the context of cpuidle. If the hardware doesn't support 
that then MCPM doesn't bring you any actual benefit.

So it depends on your hardware capabilities.


Nicolas
Chen-Yu Tsai July 25, 2015, 2:41 p.m. UTC | #4
On Fri, Jul 24, 2015 at 7:15 PM, Dave Martin <Dave.Martin@arm.com> wrote:
> On Fri, Jul 24, 2015 at 11:54:18AM +0800, Chen-Yu Tsai wrote:
>> Hi,
>>
>> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> > This concerns the following helpers:
>> >
>> >         __mcpm_cpu_going_down()
>> >         __mcpm_cpu_down()
>> >         __mcpm_outbound_enter_critical()
>> >         __mcpm_outbound_leave_critical()
>> >         __mcpm_cluster_state()
>> >
>> > They are and should only be used by the core code now.  Therefore their
>> > declarations are removed from mcpm.h and their definitions are made
>> > static, hence the need to move them before their users which accounts
>> > for the bulk of this patch.
>>
>> I'm looking for some advice. On the Allwinner A80, at least on mainline,
>> there is no external PMU or embedded controller in charge of power
>> controls. What this means is that I'm doing power sequencing in the
>> kernel as part of the MCPM calls, specifically powering down cores and
>> clusters in the .wait_for_powerdown callback. (I don't think it's
>> reasonable or even possible to power down stuff in .*_powerdown_prepare)
>>
>> Previously I was using __mcpm_cluster_state() to check if the last core
>> in a cluster was to be powered off, and thus the whole cluster could be
>> turned off as well. I could also check if the individual power gates or
>> resets are asserted, but if a core was already scheduled to be brought
>> up, and MCPM common framework didn't call .cluster_powerup, there might
>> be a problem.
>
> It's been a while since I looked at this stuff in detail, so Nico
> may want to put me right ... but here's my 2 cents:
>
>
> __mcpm_cluster_state() should be considered an implementation detail of
> mcpm.  If you feel you need to call it from your driver code, that's
> a warning that your code is probably racy -- unless you can explain
> otherwise.
>
> When you say you have no external power controller, does this mean
> that you need to poll in Linux for the affected cpus/clusters to reach
> WFI and then hit the relevant clamps, clocks and/or regulators
> yourself?

That's right. Polling a specific register to check for WFI, and gating
the cores. Clocks and external regulators haven't been implemented yet.

(I've also run into a problem where the cores in cluster 0 don't stay
 in WFI, but I think that's probably an implementation bug on my end.)

> If so you're effectively implementing a multithreaded power controller
> in software, suggesting that you need some state tracking and locking
> between the various mcpm methods in your code.  However, you can use
> the normal locking and atomics APIs for this.
>
> Since the MCPM lock is held when calling all the relevant methods
> except for wait_for_powerdown(), this nearly satisfies the locking
> requirement by itself.  You are correct that the hardware operations
> associated with physically powering things down will need to be
> done in wait_for_powerdown() though, and the MCPM lock is not held
> when calling that.

This was my reason for choosing MCPM, not having to handle all the
locking myself.

> This would motivate two solutions that I can see:
>
>  a) Expose the mcpm lock so that driver code can lock/unlock it
>     during critical regions in wait_for_powerdown().

From a users' standpoint, having to do the locking myself is not
so appealing, as the interactions with the rest of the framework
is not as clear, and an overall view of the framework is lacking
for now.

>  b) Refactor wait_for_powerdown() so that the locking, sleeping
>     and timeout code is moved into the mcpm generic code, and
>     the wait_for_powerdown() method is replaced with something
>     like "cpu_is_powered_down()".

This seems to be reversing some patch? Maybe a new (optional)
callback for people who need this? AFAIK other platforms seem
to have some embedded controller that deals with this.

The A80 actually hase an embedded controller, but no documents
are available, and the original firmware from Allwinner is not
open source. Furthermore, in Allwinner's kernel, the kernel was
in charge of loading the firmware, a bit backwards IMHO.

Thanks!


Regards
ChenYu

> Further to this, this "software power controller" really just maps one
> power sequencing model onto another, so there's a chance it could be
> a generic mcpm driver than can be reused across multiple platforms.
>
> Cheers
> ---Dave
>
Chen-Yu Tsai July 25, 2015, 2:54 p.m. UTC | #5
On Fri, Jul 24, 2015 at 11:44 PM, Nicolas Pitre
<nicolas.pitre@linaro.org> wrote:
> On Fri, 24 Jul 2015, Chen-Yu Tsai wrote:
>
>> Hi,
>>
>> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
>> > This concerns the following helpers:
>> >
>> >         __mcpm_cpu_going_down()
>> >         __mcpm_cpu_down()
>> >         __mcpm_outbound_enter_critical()
>> >         __mcpm_outbound_leave_critical()
>> >         __mcpm_cluster_state()
>> >
>> > They are and should only be used by the core code now.  Therefore their
>> > declarations are removed from mcpm.h and their definitions are made
>> > static, hence the need to move them before their users which accounts
>> > for the bulk of this patch.
>>
>> I'm looking for some advice. On the Allwinner A80, at least on mainline,
>> there is no external PMU or embedded controller in charge of power
>> controls. What this means is that I'm doing power sequencing in the
>> kernel as part of the MCPM calls, specifically powering down cores and
>> clusters in the .wait_for_powerdown callback. (I don't think it's
>> reasonable or even possible to power down stuff in .*_powerdown_prepare)
>
> Can you tell me more about the power control knobs at your disposal?  Do
> power gates become effective immediately or only when WFI is asserted?
>
> And can you configure things so a core may be powered up asynchronously
> from an IRQ?

The above probably wasn't clear enough. Power gates, reset controls and
SMP/WFI/WFE status are mapped to various mmio registers. The controls
are effective immediately.

The power gates and reset controls can only be manually controlled.
There is no mainline support for the embedded controller yet, and I
doubt Allwinner's firmware supports it either, as their kernel also
does power sequencing itself. In a nutshell, the kernel is on its
own, we do not support wakeups with IRQs.

>> Previously I was using __mcpm_cluster_state() to check if the last core
>> in a cluster was to be powered off, and thus the whole cluster could be
>> turned off as well.
>> I could also check if the individual power gates or
>> resets are asserted, but if a core was already scheduled to be brought
>> up, and MCPM common framework didn't call .cluster_powerup, there might
>> be a problem.
>
> I fail to see how a core could be scheduled to be brought up without
> deasserting its reset line somehow though.

My point is could there be a race condition in the sequence of events?
Say .*_powerup() deasserted the reset lines _after_ we checked them
in .wait_for_powerdown(). As Dave mentioned, .wait_for_powerdown() is
not called with the MCPM lock held.

But I've resolved to waiting for L2 WFI before powering off clusters.

>> Any suggestions? Maybe export __mcpm_cluster_state() so platform code
>> can know what's going to happen?
>
> The cluster state may change unexpectedly.  There is a special locking
> sequence and state machine needed to make this information reliable.
> Simply returning the current state wouldn't be enough to ensure
> it can be used race free.

I see.

> As Dave stated, we might have to supplement the MCPM core code with
> special methods involving a surviving CPU to perform the power-down
> operation on the dying CPU's behalf.  Doing this in .wait_for_powerdown
> is just an abuse of the API.

The other users I looked at all had other pieces of hardware taking care
of this, so I couldn't really understand where I could put this.

If adding another callback to handle this is acceptable, then I can
look into it.

> It also brings up the question if MCPM is actually necessary in that
> case or if you can do without its complexity.  For example, you may look
> at commit 905cdf9dda5d for such a case.  It mainly depends on whether or
> not cores (and the cluster) may be awaken asynchronously upon assertion
> of an IRQ in the context of cpuidle. If the hardware doesn't support
> that then MCPM doesn't bring you any actual benefit.
>
> So it depends on your hardware capabilities.

If they're just in WFI, then yes (I think). If they're powered down, they
need surviving cores to power them up.

But I had the impression that MCPM was trying to host common code for
multi-cluster management, such as core reference counting, proper locking
and maybe other stuff. With the MCPM framework, I reimplemented SMP
bringup and CPU hotplugging on the A80 with just half of LOC compared to
Allwinner's implementation, which is just a variation of another
platform's custom MCPM code.


Regards
ChenYu
Nicolas Pitre July 27, 2015, 4:43 a.m. UTC | #6
On Sat, 25 Jul 2015, Chen-Yu Tsai wrote:

> On Fri, Jul 24, 2015 at 11:44 PM, Nicolas Pitre
> <nicolas.pitre@linaro.org> wrote:
> > On Fri, 24 Jul 2015, Chen-Yu Tsai wrote:
> >
> >> Hi,
> >>
> >> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> >> > This concerns the following helpers:
> >> >
> >> >         __mcpm_cpu_going_down()
> >> >         __mcpm_cpu_down()
> >> >         __mcpm_outbound_enter_critical()
> >> >         __mcpm_outbound_leave_critical()
> >> >         __mcpm_cluster_state()
> >> >
> >> > They are and should only be used by the core code now.  Therefore their
> >> > declarations are removed from mcpm.h and their definitions are made
> >> > static, hence the need to move them before their users which accounts
> >> > for the bulk of this patch.
> >>
> >> I'm looking for some advice. On the Allwinner A80, at least on mainline,
> >> there is no external PMU or embedded controller in charge of power
> >> controls. What this means is that I'm doing power sequencing in the
> >> kernel as part of the MCPM calls, specifically powering down cores and
> >> clusters in the .wait_for_powerdown callback. (I don't think it's
> >> reasonable or even possible to power down stuff in .*_powerdown_prepare)
> >
> > Can you tell me more about the power control knobs at your disposal?  Do
> > power gates become effective immediately or only when WFI is asserted?
> >
> > And can you configure things so a core may be powered up asynchronously
> > from an IRQ?
> 
> The above probably wasn't clear enough. Power gates, reset controls and
> SMP/WFI/WFE status are mapped to various mmio registers. The controls
> are effective immediately.
> 
> The power gates and reset controls can only be manually controlled.
> There is no mainline support for the embedded controller yet, and I
> doubt Allwinner's firmware supports it either, as their kernel also
> does power sequencing itself. In a nutshell, the kernel is on its
> own, we do not support wakeups with IRQs.
> 
> >> Previously I was using __mcpm_cluster_state() to check if the last core
> >> in a cluster was to be powered off, and thus the whole cluster could be
> >> turned off as well.
> >> I could also check if the individual power gates or
> >> resets are asserted, but if a core was already scheduled to be brought
> >> up, and MCPM common framework didn't call .cluster_powerup, there might
> >> be a problem.
> >
> > I fail to see how a core could be scheduled to be brought up without
> > deasserting its reset line somehow though.
> 
> My point is could there be a race condition in the sequence of events?
> Say .*_powerup() deasserted the reset lines _after_ we checked them
> in .wait_for_powerdown(). As Dave mentioned, .wait_for_powerdown() is
> not called with the MCPM lock held.

In theory this should never happen.  Even if .wait_for_powerdown() was 
prevented from running concurrently with .*_powerup(), nothing would 
prevent .*_powerup() from running the moment .wait_for_powerdown() 
has returned.  So it is up to the higher level not to power up a CPU and 
wait for it to be down at the same time since this simply makes no 
sense... unless it really wants the CPU back right away.

> But I've resolved to waiting for L2 WFI before powering off clusters.
> 
> >> Any suggestions? Maybe export __mcpm_cluster_state() so platform code
> >> can know what's going to happen?
> >
> > The cluster state may change unexpectedly.  There is a special locking
> > sequence and state machine needed to make this information reliable.
> > Simply returning the current state wouldn't be enough to ensure
> > it can be used race free.
> 
> I see.
> 
> > As Dave stated, we might have to supplement the MCPM core code with
> > special methods involving a surviving CPU to perform the power-down
> > operation on the dying CPU's behalf.  Doing this in .wait_for_powerdown
> > is just an abuse of the API.
> 
> The other users I looked at all had other pieces of hardware taking care
> of this, so I couldn't really understand where I could put this.

I think what should be done in this case is simply to put the task of 
killing the CPU power on a work queue that gets executed by another CPU. 
Queueing the necessary work could be done from the MCPM core code the 
moment a special method exists in the machine backend structure. The 
work callback would take the MCPM lock, make sure the CPU/cluster state 
is "DOWN" and call that special method.

A call to .wait_for_powerdown() should not be responsible for the actual 
power down. It should remain optional.

> If adding another callback to handle this is acceptable, then I can
> look into it.

Please be my guest.


Nicolas
Dave Martin July 27, 2015, 11:38 a.m. UTC | #7
On Mon, Jul 27, 2015 at 12:43:28AM -0400, Nicolas Pitre wrote:
> On Sat, 25 Jul 2015, Chen-Yu Tsai wrote:
> 
> > On Fri, Jul 24, 2015 at 11:44 PM, Nicolas Pitre
> > <nicolas.pitre@linaro.org> wrote:
> > > On Fri, 24 Jul 2015, Chen-Yu Tsai wrote:
> > >
> > >> Hi,
> > >>
> > >> On Sat, May 2, 2015 at 12:06 AM, Nicolas Pitre <nicolas.pitre@linaro.org> wrote:
> > >> > This concerns the following helpers:
> > >> >
> > >> >         __mcpm_cpu_going_down()
> > >> >         __mcpm_cpu_down()
> > >> >         __mcpm_outbound_enter_critical()
> > >> >         __mcpm_outbound_leave_critical()
> > >> >         __mcpm_cluster_state()
> > >> >
> > >> > They are and should only be used by the core code now.  Therefore their
> > >> > declarations are removed from mcpm.h and their definitions are made
> > >> > static, hence the need to move them before their users which accounts
> > >> > for the bulk of this patch.
> > >>
> > >> I'm looking for some advice. On the Allwinner A80, at least on mainline,
> > >> there is no external PMU or embedded controller in charge of power
> > >> controls. What this means is that I'm doing power sequencing in the
> > >> kernel as part of the MCPM calls, specifically powering down cores and
> > >> clusters in the .wait_for_powerdown callback. (I don't think it's
> > >> reasonable or even possible to power down stuff in .*_powerdown_prepare)
> > >
> > > Can you tell me more about the power control knobs at your disposal?  Do
> > > power gates become effective immediately or only when WFI is asserted?
> > >
> > > And can you configure things so a core may be powered up asynchronously
> > > from an IRQ?
> > 
> > The above probably wasn't clear enough. Power gates, reset controls and
> > SMP/WFI/WFE status are mapped to various mmio registers. The controls
> > are effective immediately.
> > 
> > The power gates and reset controls can only be manually controlled.
> > There is no mainline support for the embedded controller yet, and I
> > doubt Allwinner's firmware supports it either, as their kernel also
> > does power sequencing itself. In a nutshell, the kernel is on its
> > own, we do not support wakeups with IRQs.
> > 
> > >> Previously I was using __mcpm_cluster_state() to check if the last core
> > >> in a cluster was to be powered off, and thus the whole cluster could be
> > >> turned off as well.
> > >> I could also check if the individual power gates or
> > >> resets are asserted, but if a core was already scheduled to be brought
> > >> up, and MCPM common framework didn't call .cluster_powerup, there might
> > >> be a problem.
> > >
> > > I fail to see how a core could be scheduled to be brought up without
> > > deasserting its reset line somehow though.
> > 
> > My point is could there be a race condition in the sequence of events?
> > Say .*_powerup() deasserted the reset lines _after_ we checked them
> > in .wait_for_powerdown(). As Dave mentioned, .wait_for_powerdown() is
> > not called with the MCPM lock held.
> 
> In theory this should never happen.  Even if .wait_for_powerdown() was 
> prevented from running concurrently with .*_powerup(), nothing would 
> prevent .*_powerup() from running the moment .wait_for_powerdown() 
> has returned.  So it is up to the higher level not to power up a CPU and 
> wait for it to be down at the same time since this simply makes no 
> sense... unless it really wants the CPU back right away.
> 
> > But I've resolved to waiting for L2 WFI before powering off clusters.
> > 
> > >> Any suggestions? Maybe export __mcpm_cluster_state() so platform code
> > >> can know what's going to happen?
> > >
> > > The cluster state may change unexpectedly.  There is a special locking
> > > sequence and state machine needed to make this information reliable.
> > > Simply returning the current state wouldn't be enough to ensure
> > > it can be used race free.
> > 
> > I see.
> > 
> > > As Dave stated, we might have to supplement the MCPM core code with
> > > special methods involving a surviving CPU to perform the power-down
> > > operation on the dying CPU's behalf.  Doing this in .wait_for_powerdown
> > > is just an abuse of the API.
> > 
> > The other users I looked at all had other pieces of hardware taking care
> > of this, so I couldn't really understand where I could put this.
> 
> I think what should be done in this case is simply to put the task of 
> killing the CPU power on a work queue that gets executed by another CPU. 
> Queueing the necessary work could be done from the MCPM core code the 
> moment a special method exists in the machine backend structure. The 
> work callback would take the MCPM lock, make sure the CPU/cluster state 
> is "DOWN" and call that special method.
> 
> A call to .wait_for_powerdown() should not be responsible for the actual 
> power down. It should remain optional.
> 
> > If adding another callback to handle this is acceptable, then I can
> > look into it.
> 
> Please be my guest.

I think I originally named .wait_for_powerdown ".power_down_finish", or
something like that, with the expectation that this would both do the
serialisation and the last rites.

The dual role was a bit confusing though, and doesn't necessarily fit
the framework perfectly.

Having an optional callback for the "last rites" part probably would
be clearer.

Cheers
---Dave
diff mbox

Patch

diff --git a/arch/arm/common/mcpm_entry.c b/arch/arm/common/mcpm_entry.c
index 0908f96278..c5fe2e33e6 100644
--- a/arch/arm/common/mcpm_entry.c
+++ b/arch/arm/common/mcpm_entry.c
@@ -20,6 +20,121 @@ 
 #include <asm/cputype.h>
 #include <asm/suspend.h>
 
+
+struct sync_struct mcpm_sync;
+
+/*
+ * __mcpm_cpu_going_down: Indicates that the cpu is being torn down.
+ *    This must be called at the point of committing to teardown of a CPU.
+ *    The CPU cache (SCTRL.C bit) is expected to still be active.
+ */
+static void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster)
+{
+	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_GOING_DOWN;
+	sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
+}
+
+/*
+ * __mcpm_cpu_down: Indicates that cpu teardown is complete and that the
+ *    cluster can be torn down without disrupting this CPU.
+ *    To avoid deadlocks, this must be called before a CPU is powered down.
+ *    The CPU cache (SCTRL.C bit) is expected to be off.
+ *    However L2 cache might or might not be active.
+ */
+static void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster)
+{
+	dmb();
+	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_DOWN;
+	sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
+	sev();
+}
+
+/*
+ * __mcpm_outbound_leave_critical: Leave the cluster teardown critical section.
+ * @state: the final state of the cluster:
+ *     CLUSTER_UP: no destructive teardown was done and the cluster has been
+ *         restored to the previous state (CPU cache still active); or
+ *     CLUSTER_DOWN: the cluster has been torn-down, ready for power-off
+ *         (CPU cache disabled, L2 cache either enabled or disabled).
+ */
+static void __mcpm_outbound_leave_critical(unsigned int cluster, int state)
+{
+	dmb();
+	mcpm_sync.clusters[cluster].cluster = state;
+	sync_cache_w(&mcpm_sync.clusters[cluster].cluster);
+	sev();
+}
+
+/*
+ * __mcpm_outbound_enter_critical: Enter the cluster teardown critical section.
+ * This function should be called by the last man, after local CPU teardown
+ * is complete.  CPU cache expected to be active.
+ *
+ * Returns:
+ *     false: the critical section was not entered because an inbound CPU was
+ *         observed, or the cluster is already being set up;
+ *     true: the critical section was entered: it is now safe to tear down the
+ *         cluster.
+ */
+static bool __mcpm_outbound_enter_critical(unsigned int cpu, unsigned int cluster)
+{
+	unsigned int i;
+	struct mcpm_sync_struct *c = &mcpm_sync.clusters[cluster];
+
+	/* Warn inbound CPUs that the cluster is being torn down: */
+	c->cluster = CLUSTER_GOING_DOWN;
+	sync_cache_w(&c->cluster);
+
+	/* Back out if the inbound cluster is already in the critical region: */
+	sync_cache_r(&c->inbound);
+	if (c->inbound == INBOUND_COMING_UP)
+		goto abort;
+
+	/*
+	 * Wait for all CPUs to get out of the GOING_DOWN state, so that local
+	 * teardown is complete on each CPU before tearing down the cluster.
+	 *
+	 * If any CPU has been woken up again from the DOWN state, then we
+	 * shouldn't be taking the cluster down at all: abort in that case.
+	 */
+	sync_cache_r(&c->cpus);
+	for (i = 0; i < MAX_CPUS_PER_CLUSTER; i++) {
+		int cpustate;
+
+		if (i == cpu)
+			continue;
+
+		while (1) {
+			cpustate = c->cpus[i].cpu;
+			if (cpustate != CPU_GOING_DOWN)
+				break;
+
+			wfe();
+			sync_cache_r(&c->cpus[i].cpu);
+		}
+
+		switch (cpustate) {
+		case CPU_DOWN:
+			continue;
+
+		default:
+			goto abort;
+		}
+	}
+
+	return true;
+
+abort:
+	__mcpm_outbound_leave_critical(cluster, CLUSTER_UP);
+	return false;
+}
+
+static int __mcpm_cluster_state(unsigned int cluster)
+{
+	sync_cache_r(&mcpm_sync.clusters[cluster].cluster);
+	return mcpm_sync.clusters[cluster].cluster;
+}
+
 extern unsigned long mcpm_entry_vectors[MAX_NR_CLUSTERS][MAX_CPUS_PER_CLUSTER];
 
 void mcpm_set_entry_vector(unsigned cpu, unsigned cluster, void *ptr)
@@ -299,120 +414,6 @@  int __init mcpm_loopback(void (*cache_disable)(void))
 
 #endif
 
-struct sync_struct mcpm_sync;
-
-/*
- * __mcpm_cpu_going_down: Indicates that the cpu is being torn down.
- *    This must be called at the point of committing to teardown of a CPU.
- *    The CPU cache (SCTRL.C bit) is expected to still be active.
- */
-void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster)
-{
-	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_GOING_DOWN;
-	sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
-}
-
-/*
- * __mcpm_cpu_down: Indicates that cpu teardown is complete and that the
- *    cluster can be torn down without disrupting this CPU.
- *    To avoid deadlocks, this must be called before a CPU is powered down.
- *    The CPU cache (SCTRL.C bit) is expected to be off.
- *    However L2 cache might or might not be active.
- */
-void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster)
-{
-	dmb();
-	mcpm_sync.clusters[cluster].cpus[cpu].cpu = CPU_DOWN;
-	sync_cache_w(&mcpm_sync.clusters[cluster].cpus[cpu].cpu);
-	sev();
-}
-
-/*
- * __mcpm_outbound_leave_critical: Leave the cluster teardown critical section.
- * @state: the final state of the cluster:
- *     CLUSTER_UP: no destructive teardown was done and the cluster has been
- *         restored to the previous state (CPU cache still active); or
- *     CLUSTER_DOWN: the cluster has been torn-down, ready for power-off
- *         (CPU cache disabled, L2 cache either enabled or disabled).
- */
-void __mcpm_outbound_leave_critical(unsigned int cluster, int state)
-{
-	dmb();
-	mcpm_sync.clusters[cluster].cluster = state;
-	sync_cache_w(&mcpm_sync.clusters[cluster].cluster);
-	sev();
-}
-
-/*
- * __mcpm_outbound_enter_critical: Enter the cluster teardown critical section.
- * This function should be called by the last man, after local CPU teardown
- * is complete.  CPU cache expected to be active.
- *
- * Returns:
- *     false: the critical section was not entered because an inbound CPU was
- *         observed, or the cluster is already being set up;
- *     true: the critical section was entered: it is now safe to tear down the
- *         cluster.
- */
-bool __mcpm_outbound_enter_critical(unsigned int cpu, unsigned int cluster)
-{
-	unsigned int i;
-	struct mcpm_sync_struct *c = &mcpm_sync.clusters[cluster];
-
-	/* Warn inbound CPUs that the cluster is being torn down: */
-	c->cluster = CLUSTER_GOING_DOWN;
-	sync_cache_w(&c->cluster);
-
-	/* Back out if the inbound cluster is already in the critical region: */
-	sync_cache_r(&c->inbound);
-	if (c->inbound == INBOUND_COMING_UP)
-		goto abort;
-
-	/*
-	 * Wait for all CPUs to get out of the GOING_DOWN state, so that local
-	 * teardown is complete on each CPU before tearing down the cluster.
-	 *
-	 * If any CPU has been woken up again from the DOWN state, then we
-	 * shouldn't be taking the cluster down at all: abort in that case.
-	 */
-	sync_cache_r(&c->cpus);
-	for (i = 0; i < MAX_CPUS_PER_CLUSTER; i++) {
-		int cpustate;
-
-		if (i == cpu)
-			continue;
-
-		while (1) {
-			cpustate = c->cpus[i].cpu;
-			if (cpustate != CPU_GOING_DOWN)
-				break;
-
-			wfe();
-			sync_cache_r(&c->cpus[i].cpu);
-		}
-
-		switch (cpustate) {
-		case CPU_DOWN:
-			continue;
-
-		default:
-			goto abort;
-		}
-	}
-
-	return true;
-
-abort:
-	__mcpm_outbound_leave_critical(cluster, CLUSTER_UP);
-	return false;
-}
-
-int __mcpm_cluster_state(unsigned int cluster)
-{
-	sync_cache_r(&mcpm_sync.clusters[cluster].cluster);
-	return mcpm_sync.clusters[cluster].cluster;
-}
-
 extern unsigned long mcpm_power_up_setup_phys;
 
 int __init mcpm_sync_init(
diff --git a/arch/arm/include/asm/mcpm.h b/arch/arm/include/asm/mcpm.h
index e2118c941d..6a40d5f8db 100644
--- a/arch/arm/include/asm/mcpm.h
+++ b/arch/arm/include/asm/mcpm.h
@@ -245,35 +245,6 @@  struct mcpm_platform_ops {
  */
 int __init mcpm_platform_register(const struct mcpm_platform_ops *ops);
 
-/* Synchronisation structures for coordinating safe cluster setup/teardown: */
-
-/*
- * When modifying this structure, make sure you update the MCPM_SYNC_ defines
- * to match.
- */
-struct mcpm_sync_struct {
-	/* individual CPU states */
-	struct {
-		s8 cpu __aligned(__CACHE_WRITEBACK_GRANULE);
-	} cpus[MAX_CPUS_PER_CLUSTER];
-
-	/* cluster state */
-	s8 cluster __aligned(__CACHE_WRITEBACK_GRANULE);
-
-	/* inbound-side state */
-	s8 inbound __aligned(__CACHE_WRITEBACK_GRANULE);
-};
-
-struct sync_struct {
-	struct mcpm_sync_struct clusters[MAX_NR_CLUSTERS];
-};
-
-void __mcpm_cpu_going_down(unsigned int cpu, unsigned int cluster);
-void __mcpm_cpu_down(unsigned int cpu, unsigned int cluster);
-void __mcpm_outbound_leave_critical(unsigned int cluster, int state);
-bool __mcpm_outbound_enter_critical(unsigned int this_cpu, unsigned int cluster);
-int __mcpm_cluster_state(unsigned int cluster);
-
 /**
  * mcpm_sync_init - Initialize the cluster synchronization support
  *
@@ -312,6 +283,29 @@  int __init mcpm_loopback(void (*cache_disable)(void));
 
 void __init mcpm_smp_set_ops(void);
 
+/*
+ * Synchronisation structures for coordinating safe cluster setup/teardown.
+ * This is private to the MCPM core code and shared between C and assembly.
+ * When modifying this structure, make sure you update the MCPM_SYNC_ defines
+ * to match.
+ */
+struct mcpm_sync_struct {
+	/* individual CPU states */
+	struct {
+		s8 cpu __aligned(__CACHE_WRITEBACK_GRANULE);
+	} cpus[MAX_CPUS_PER_CLUSTER];
+
+	/* cluster state */
+	s8 cluster __aligned(__CACHE_WRITEBACK_GRANULE);
+
+	/* inbound-side state */
+	s8 inbound __aligned(__CACHE_WRITEBACK_GRANULE);
+};
+
+struct sync_struct {
+	struct mcpm_sync_struct clusters[MAX_NR_CLUSTERS];
+};
+
 #else
 
 /*