diff mbox

[RFC,1/2] PM / OPP: add support to specify phandle of another node for OPP

Message ID 1375207217-4433-2-git-send-email-Sudeep.KarkadaNagesha@arm.com (mailing list archive)
State RFC, archived
Headers show

Commit Message

Sudeep KarkadaNagesha July 30, 2013, 6 p.m. UTC
From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>

If more than one similar devices share the same OPPs, currently we
need to replicate the OPP entries in all the nodes.

Few drivers like cpufreq depend on physical cpu0 node to specify the
OPPs and only that node is referred irrespective of the logical cpu
accessing it. Alternatively to support cpuhotplug path, few drivers
parse all the cpu nodes for OPPs. Instead we can specify the phandle
of the node with which the current node shares the operating points.

This patch adds support to specify the phandle in the operating points
of any device node, where the node specified by the phandle holds the
actual OPPs.

Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Stephen Warren <swarren@wwwdotorg.org>
Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
Cc: Nishanth Menon <nm@ti.com>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
---
 Documentation/devicetree/bindings/power/opp.txt | 114 +++++++++++++++++++++---
 drivers/base/power/opp.c                        |  12 ++-
 2 files changed, 112 insertions(+), 14 deletions(-)

Comments

Stephen Warren July 30, 2013, 6:34 p.m. UTC | #1
On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
> 
> If more than one similar devices share the same OPPs, currently we
> need to replicate the OPP entries in all the nodes.
> 
> Few drivers like cpufreq depend on physical cpu0 node to specify the
> OPPs and only that node is referred irrespective of the logical cpu
> accessing it. Alternatively to support cpuhotplug path, few drivers
> parse all the cpu nodes for OPPs. Instead we can specify the phandle
> of the node with which the current node shares the operating points.
> 
> This patch adds support to specify the phandle in the operating points
> of any device node, where the node specified by the phandle holds the
> actual OPPs.

> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt

> +Optional properties:
> +- operating-points-phandle: phandle to the device node with which this

That's a funny name. Bikeshedding a bit, how about shared-operating-points?

I haven't thought at all about whether this change conceptually makes sense.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon July 30, 2013, 8:48 p.m. UTC | #2
On 07/30/2013 01:34 PM, Stephen Warren wrote:
> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>
>> If more than one similar devices share the same OPPs, currently we
>> need to replicate the OPP entries in all the nodes.
>>
>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>> OPPs and only that node is referred irrespective of the logical cpu
>> accessing it. Alternatively to support cpuhotplug path, few drivers
>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>> of the node with which the current node shares the operating points.
>>
>> This patch adds support to specify the phandle in the operating points
>> of any device node, where the node specified by the phandle holds the
>> actual OPPs.
>
>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>
>> +Optional properties:
>> +- operating-points-phandle: phandle to the device node with which this
>
> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>
> I haven't thought at all about whether this change conceptually makes sense.
>

They may not really be shared- we could have phandle list even. one 
might have optional OPP sets for a chip family that one may  - I was 
about to suggest something similar to pinctrl

operating-points-names = "default", "performance", "cheapboard-config" ;)
operating-points-0 = <&...>
operating-points-1 = <&...>
operating-points-2 = <&...>

+ wanted also to consider how we might have a single definition to scale 
across to what Mike is attempting to do with a generic clock framework 
support for DVFS.

for compatibility sake, if operating-points is defined, we continue to 
expect old style definition, else we have options to pick from.

This should setup stage for many of the work we have been trying to 
figure out on AM/OMAP and few other processors which has to depend on 
few sets of OPPs which may not be supported on various platforms.

I am hoping our phandle solution could scale to all needed.
Stephen Warren July 30, 2013, 9:25 p.m. UTC | #3
On 07/30/2013 02:48 PM, Nishanth Menon wrote:
> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>
>>> If more than one similar devices share the same OPPs, currently we
>>> need to replicate the OPP entries in all the nodes.
>>>
>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>> OPPs and only that node is referred irrespective of the logical cpu
>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>> of the node with which the current node shares the operating points.
>>>
>>> This patch adds support to specify the phandle in the operating points
>>> of any device node, where the node specified by the phandle holds the
>>> actual OPPs.
>>
>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt
>>> b/Documentation/devicetree/bindings/power/opp.txt
>>
>>> +Optional properties:
>>> +- operating-points-phandle: phandle to the device node with which this
>>
>> That's a funny name. Bikeshedding a bit, how about
>> shared-operating-points?
>>
>> I haven't thought at all about whether this change conceptually makes
>> sense.
>>
> 
> They may not really be shared- we could have phandle list even.

Well, they are shared, or you wouldn't have one node pointing at another
node and hence sharing the same property...

> one
> might have optional OPP sets for a chip family that one may  - I was
> about to suggest something similar to pinctrl
> 
> operating-points-names = "default", "performance", "cheapboard-config" ;)
> operating-points-0 = <&...>
> operating-points-1 = <&...>
> operating-points-2 = <&...>

There is an assertion that DT should only represent the absolute max
limits for things like this, and not policy-oriented data such as
different performance profiles. I don't expect you'll see anything like
the above in DT, since it's more policy than HW description.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha July 31, 2013, 10:46 a.m. UTC | #4
On 30/07/13 19:34, Stephen Warren wrote:
> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>
>> If more than one similar devices share the same OPPs, currently we
>> need to replicate the OPP entries in all the nodes.
>>
>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>> OPPs and only that node is referred irrespective of the logical cpu
>> accessing it. Alternatively to support cpuhotplug path, few drivers
>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>> of the node with which the current node shares the operating points.
>>
>> This patch adds support to specify the phandle in the operating points
>> of any device node, where the node specified by the phandle holds the
>> actual OPPs.
> 
>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> 
>> +Optional properties:
>> +- operating-points-phandle: phandle to the device node with which this
> 
> That's a funny name. Bikeshedding a bit, how about shared-operating-points?

shared-operating-points makes sense, but I was thinking that name should
indicate it's reference to the device that it shares OPP with. I agree
'operating-points-phandle' is no good, is 'shared-opp-device' any better ?

Regards,
Sudeep


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha July 31, 2013, 11:14 a.m. UTC | #5
On 30/07/13 21:48, Nishanth Menon wrote:
> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>
>>> If more than one similar devices share the same OPPs, currently we
>>> need to replicate the OPP entries in all the nodes.
>>>
>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>> OPPs and only that node is referred irrespective of the logical cpu
>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>> of the node with which the current node shares the operating points.
>>>
>>> This patch adds support to specify the phandle in the operating points
>>> of any device node, where the node specified by the phandle holds the
>>> actual OPPs.
>>
>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>>
>>> +Optional properties:
>>> +- operating-points-phandle: phandle to the device node with which this
>>
>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>>
>> I haven't thought at all about whether this change conceptually makes sense.
>>
> 
> They may not really be shared- we could have phandle list even. one 
> might have optional OPP sets for a chip family that one may  - I was 
> about to suggest something similar to pinctrl
> 
I am not sure if I follow you here, if each chip family has its unique
set of OPPs, why do we need to represent all of them together ?
IIUC you are thinking about having these in include dts file, used by
multiple chip/board dts.

> operating-points-names = "default", "performance", "cheapboard-config" ;)
> operating-points-0 = <&...>
> operating-points-1 = <&...>
> operating-points-2 = <&...>
> 
This looks more like a PM policy.

> + wanted also to consider how we might have a single definition to scale 
> across to what Mike is attempting to do with a generic clock framework 
> support for DVFS.
> 
I don't quite follow what you are trying to say.

In fact, following Mike's consolidation I had a thought that OPP must be
part of clock node as multiple devices in the same clock domain refer to
the same clock node.

> for compatibility sake, if operating-points is defined, we continue to 
> expect old style definition, else we have options to pick from.
> 
Yes we can do that, but we need to agree on where these OPPs need to
present in DTS.

> This should setup stage for many of the work we have been trying to 
> figure out on AM/OMAP and few other processors which has to depend on 
> few sets of OPPs which may not be supported on various platforms.
> 
I still don't get the point why you would publish some OPP in the DT
when the hardware which it describes doesn't support it.

This may be already discussed when DT support was added to OPP library,
IMO if for some reason the firmware/boot entity disables some of the
OPPs, then it can append OPPs in DT with the state(enabled/disabled).
But this needs extension of current binding.

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon July 31, 2013, 2:46 p.m. UTC | #6
On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> On 30/07/13 21:48, Nishanth Menon wrote:
>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>>
>>>> If more than one similar devices share the same OPPs, currently we
>>>> need to replicate the OPP entries in all the nodes.
>>>>
>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>>> OPPs and only that node is referred irrespective of the logical cpu
>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>>> of the node with which the current node shares the operating points.
>>>>
>>>> This patch adds support to specify the phandle in the operating points
>>>> of any device node, where the node specified by the phandle holds the
>>>> actual OPPs.
>>>
>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>>>
>>>> +Optional properties:
>>>> +- operating-points-phandle: phandle to the device node with which this
>>>
>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>>>
>>> I haven't thought at all about whether this change conceptually makes sense.
>>>
>>
>> They may not really be shared- we could have phandle list even. one
>> might have optional OPP sets for a chip family that one may  - I was
>> about to suggest something similar to pinctrl
>>
> I am not sure if I follow you here, if each chip family has its unique
> set of OPPs, why do we need to represent all of them together ?
> IIUC you are thinking about having these in include dts file, used by
> multiple chip/board dts.
>
>> operating-points-names = "default", "performance", "cheapboard-config" ;)
>> operating-points-0 = <&...>
>> operating-points-1 = <&...>
>> operating-points-2 = <&...>
>>
> This looks more like a PM policy.

Let me try to explain since SoCs such as OMAP/AM family dont make life 
trivial :)..

An legacy example[1][2]

SoC DM explains that the chip is capable of X opps:
opp1, 2 - for all devices
opp1,2, 3 - if efuse bit X@y is set
opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors 
requirements (including additional features A, B is enabled).

So, the same chip family has a hardware feature - not just as a pm 
policy of selecting among a set of OPPs which opp to work on, but the 
actual set of OPPs are actually options in themselves that is selected 
based on board's SoC selection.

this you could in theory, compare to selecting an pinctrl configuration 
option based on certain hardware selection criteria.

>

>> + wanted also to consider how we might have a single definition to scale
>> across to what Mike is attempting to do with a generic clock framework
>> support for DVFS.
>>
> I don't quite follow what you are trying to say.
>
> In fact, following Mike's consolidation I had a thought that OPP must be
> part of clock node as multiple devices in the same clock domain refer to
> the same clock node.

just that the configuration and option we select *must* think beyond 
just CPU with a generic clock framework triggered dvfs - the strategy 
must work for devfreq/other frameworks as well.

>
>> for compatibility sake, if operating-points is defined, we continue to
>> expect old style definition, else we have options to pick from.
>>
> Yes we can do that, but we need to agree on where these OPPs need to
> present in DTS.

I see Grant is in alignment, and I personally like the idea as well - as 
long as we dont break backward compatibility.

>
>> This should setup stage for many of the work we have been trying to
>> figure out on AM/OMAP and few other processors which has to depend on
>> few sets of OPPs which may not be supported on various platforms.
>>
> I still don't get the point why you would publish some OPP in the DT
> when the hardware which it describes doesn't support it.
>
> This may be already discussed when DT support was added to OPP library,
> IMO if for some reason the firmware/boot entity disables some of the
> OPPs, then it can append OPPs in DT with the state(enabled/disabled).
> But this needs extension of current binding.

you could also have reduced OPP set which needs to be invoked, appending 
wont really work if cpufreq table is built as part of probe - it kind of 
creates all kind of races which I would really like to avoid.


[1] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/mach-omap2/opp4xxx_data.c#n134
[2] 
http://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/arch/arm/mach-omap2/id.c#n290
Sudeep KarkadaNagesha July 31, 2013, 3:28 p.m. UTC | #7
On 31/07/13 15:46, Nishanth Menon wrote:
> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>> On 30/07/13 21:48, Nishanth Menon wrote:
[...]
>>> This should setup stage for many of the work we have been trying to
>>> figure out on AM/OMAP and few other processors which has to depend on
>>> few sets of OPPs which may not be supported on various platforms.
>>>
>> I still don't get the point why you would publish some OPP in the DT
>> when the hardware which it describes doesn't support it.
>>
>> This may be already discussed when DT support was added to OPP library,
>> IMO if for some reason the firmware/boot entity disables some of the
>> OPPs, then it can append OPPs in DT with the state(enabled/disabled).
>> But this needs extension of current binding.
> 
> you could also have reduced OPP set which needs to be invoked, appending 
> wont really work if cpufreq table is built as part of probe - it kind of 
> creates all kind of races which I would really like to avoid.
> 
IIUC opp_set_availability(opp_enable/opp_disable) is designed for such
use-case ? Currently there are no users of this API but I see it fits
your use case.

Even with multiple OPP sets listed in DT as you described, you need to
read those fuses and chose the right set of OPPs. Instead you can use
opp_en(/dis)able methods to do that ?

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland July 31, 2013, 3:29 p.m. UTC | #8
On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> > On 30/07/13 21:48, Nishanth Menon wrote:
> >> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> >>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> >>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
> >>>>
> >>>> If more than one similar devices share the same OPPs, currently we
> >>>> need to replicate the OPP entries in all the nodes.
> >>>>
> >>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
> >>>> OPPs and only that node is referred irrespective of the logical cpu
> >>>> accessing it. Alternatively to support cpuhotplug path, few drivers
> >>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
> >>>> of the node with which the current node shares the operating points.
> >>>>
> >>>> This patch adds support to specify the phandle in the operating points
> >>>> of any device node, where the node specified by the phandle holds the
> >>>> actual OPPs.
> >>>
> >>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> >>>
> >>>> +Optional properties:
> >>>> +- operating-points-phandle: phandle to the device node with which this
> >>>
> >>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
> >>>
> >>> I haven't thought at all about whether this change conceptually makes sense.
> >>>
> >>
> >> They may not really be shared- we could have phandle list even. one
> >> might have optional OPP sets for a chip family that one may  - I was
> >> about to suggest something similar to pinctrl
> >>
> > I am not sure if I follow you here, if each chip family has its unique
> > set of OPPs, why do we need to represent all of them together ?
> > IIUC you are thinking about having these in include dts file, used by
> > multiple chip/board dts.
> >
> >> operating-points-names = "default", "performance", "cheapboard-config" ;)
> >> operating-points-0 = <&...>
> >> operating-points-1 = <&...>
> >> operating-points-2 = <&...>
> >>
> > This looks more like a PM policy.
> 
> Let me try to explain since SoCs such as OMAP/AM family dont make life 
> trivial :)..
> 
> An legacy example[1][2]
> 
> SoC DM explains that the chip is capable of X opps:
> opp1, 2 - for all devices
> opp1,2, 3 - if efuse bit X@y is set
> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors 
> requirements (including additional features A, B is enabled).
> 
> So, the same chip family has a hardware feature - not just as a pm 
> policy of selecting among a set of OPPs which opp to work on, but the 
> actual set of OPPs are actually options in themselves that is selected 
> based on board's SoC selection.

This sounds like we're describing a set of features not applicable to
the device, then removing them, rather than only describing those
features applicable to the device. If you have to probe to figure out
which values in the dt are applicable, I'm not sure I see the benefit of
describing said values in dt.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon July 31, 2013, 3:53 p.m. UTC | #9
On 07/31/2013 10:28 AM, Sudeep KarkadaNagesha wrote:
> On 31/07/13 15:46, Nishanth Menon wrote:
>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>> On 30/07/13 21:48, Nishanth Menon wrote:
> [...]
>>>> This should setup stage for many of the work we have been trying to
>>>> figure out on AM/OMAP and few other processors which has to depend on
>>>> few sets of OPPs which may not be supported on various platforms.
>>>>
>>> I still don't get the point why you would publish some OPP in the DT
>>> when the hardware which it describes doesn't support it.
>>>
>>> This may be already discussed when DT support was added to OPP library,
>>> IMO if for some reason the firmware/boot entity disables some of the
>>> OPPs, then it can append OPPs in DT with the state(enabled/disabled).
>>> But this needs extension of current binding.
>>
>> you could also have reduced OPP set which needs to be invoked, appending
>> wont really work if cpufreq table is built as part of probe - it kind of
>> creates all kind of races which I would really like to avoid.
>>
> IIUC opp_set_availability(opp_enable/opp_disable) is designed for such
> use-case ? Currently there are no users of this API but I see it fits
> your use case.
>
> Even with multiple OPP sets listed in DT as you described, you need to
> read those fuses and chose the right set of OPPs. Instead you can use
> opp_en(/dis)able methods to do that ?
>
yes when the efuse data is present, but look at the other case I had 
also pointed at:

Lets take an example: SoC X has OPPs 1,2,3,4

Same SoC is used on Board A and B.
Board A meets with all SoC vendor requirements for routing, IR drop 
limits etc
Board B *does not* meet with all SoC vendor requirements for routing, IR 
drop limits etc

we no longer have board files, board will have to have a mechanism to 
"state it is not optimal configuration".

A real life example is BeagleBoard Xm and another product board(which I 
cannot mention) -both use OMAP3630 1GHz part. 1GHz requirements are met 
on BeagleBoard Xm, but on the product board it is not. Chip used is 
exactly the same, we dont have "dts property" to mention "yes, this 
board meets SoC data manual and associated documentation requirement" - 
instead what we do have is what is the chip capable of doing.

opp_enable/disable wont work here unless there is board specific 
"properties" we introduce. However, board file could choose "low 
performance" option of OPPs.

the opp_enable/disable wont scale there. Further, opp_add is done 
enmasse by cpufreq-cpu0 and and cpufreq table is built off it, there is 
no option of SoC specific modification to the table (opportunity to do 
opp_enable/disable) there - not something that cannot be fixed, and 
eventually will be, but not there right now.
Nishanth Menon July 31, 2013, 3:58 p.m. UTC | #10
On 07/31/2013 10:29 AM, Mark Rutland wrote:
> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>> On 30/07/13 21:48, Nishanth Menon wrote:
>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>>>>
>>>>>> If more than one similar devices share the same OPPs, currently we
>>>>>> need to replicate the OPP entries in all the nodes.
>>>>>>
>>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>>>>> OPPs and only that node is referred irrespective of the logical cpu
>>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>>>>> of the node with which the current node shares the operating points.
>>>>>>
>>>>>> This patch adds support to specify the phandle in the operating points
>>>>>> of any device node, where the node specified by the phandle holds the
>>>>>> actual OPPs.
>>>>>
>>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>>>>>
>>>>>> +Optional properties:
>>>>>> +- operating-points-phandle: phandle to the device node with which this
>>>>>
>>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>>>>>
>>>>> I haven't thought at all about whether this change conceptually makes sense.
>>>>>
>>>>
>>>> They may not really be shared- we could have phandle list even. one
>>>> might have optional OPP sets for a chip family that one may  - I was
>>>> about to suggest something similar to pinctrl
>>>>
>>> I am not sure if I follow you here, if each chip family has its unique
>>> set of OPPs, why do we need to represent all of them together ?
>>> IIUC you are thinking about having these in include dts file, used by
>>> multiple chip/board dts.
>>>
>>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
>>>> operating-points-0 = <&...>
>>>> operating-points-1 = <&...>
>>>> operating-points-2 = <&...>
>>>>
>>> This looks more like a PM policy.
>>
>> Let me try to explain since SoCs such as OMAP/AM family dont make life
>> trivial :)..
>>
>> An legacy example[1][2]
>>
>> SoC DM explains that the chip is capable of X opps:
>> opp1, 2 - for all devices
>> opp1,2, 3 - if efuse bit X@y is set
>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
>> requirements (including additional features A, B is enabled).
>>
>> So, the same chip family has a hardware feature - not just as a pm
>> policy of selecting among a set of OPPs which opp to work on, but the
>> actual set of OPPs are actually options in themselves that is selected
>> based on board's SoC selection.
>
> This sounds like we're describing a set of features not applicable to
> the device, then removing them, rather than only describing those
> features applicable to the device. If you have to probe to figure out
> which values in the dt are applicable, I'm not sure I see the benefit of
> describing said values in dt.

Device has *options* of operating points sets it can operate at. It is 
not like "these are not applicable" for the device.

DT does have to describe the hardware capability - that was it's entire 
intent. operating points are valid configurations where it can be 
operated at - and when you have options of configurations you need to 
choose from based on the board you are using it on, it still retains 
"hardware behavior" aspect.

Hope that explains.
Mark Rutland July 31, 2013, 4:11 p.m. UTC | #11
On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
> On 07/31/2013 10:29 AM, Mark Rutland wrote:
> > On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> >> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> >>> On 30/07/13 21:48, Nishanth Menon wrote:
> >>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> >>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> >>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
> >>>>>>
> >>>>>> If more than one similar devices share the same OPPs, currently we
> >>>>>> need to replicate the OPP entries in all the nodes.
> >>>>>>
> >>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
> >>>>>> OPPs and only that node is referred irrespective of the logical cpu
> >>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
> >>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
> >>>>>> of the node with which the current node shares the operating points.
> >>>>>>
> >>>>>> This patch adds support to specify the phandle in the operating points
> >>>>>> of any device node, where the node specified by the phandle holds the
> >>>>>> actual OPPs.
> >>>>>
> >>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> >>>>>
> >>>>>> +Optional properties:
> >>>>>> +- operating-points-phandle: phandle to the device node with which this
> >>>>>
> >>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
> >>>>>
> >>>>> I haven't thought at all about whether this change conceptually makes sense.
> >>>>>
> >>>>
> >>>> They may not really be shared- we could have phandle list even. one
> >>>> might have optional OPP sets for a chip family that one may  - I was
> >>>> about to suggest something similar to pinctrl
> >>>>
> >>> I am not sure if I follow you here, if each chip family has its unique
> >>> set of OPPs, why do we need to represent all of them together ?
> >>> IIUC you are thinking about having these in include dts file, used by
> >>> multiple chip/board dts.
> >>>
> >>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
> >>>> operating-points-0 = <&...>
> >>>> operating-points-1 = <&...>
> >>>> operating-points-2 = <&...>
> >>>>
> >>> This looks more like a PM policy.
> >>
> >> Let me try to explain since SoCs such as OMAP/AM family dont make life
> >> trivial :)..
> >>
> >> An legacy example[1][2]
> >>
> >> SoC DM explains that the chip is capable of X opps:
> >> opp1, 2 - for all devices
> >> opp1,2, 3 - if efuse bit X@y is set
> >> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
> >> requirements (including additional features A, B is enabled).
> >>
> >> So, the same chip family has a hardware feature - not just as a pm
> >> policy of selecting among a set of OPPs which opp to work on, but the
> >> actual set of OPPs are actually options in themselves that is selected
> >> based on board's SoC selection.
> >
> > This sounds like we're describing a set of features not applicable to
> > the device, then removing them, rather than only describing those
> > features applicable to the device. If you have to probe to figure out
> > which values in the dt are applicable, I'm not sure I see the benefit of
> > describing said values in dt.
> 
> Device has *options* of operating points sets it can operate at. It is 
> not like "these are not applicable" for the device.

I don't follow.

In the example above, if efuse bit X@y is not set, opp3 is not
applicable, but we're describing it in dt. It's not an option for the
particular device, yet it appears in the device's dt.

For opp4, it's even worse, as you're suggesting we describe an option
for the device that requires the driver to use some additional platform
knowledge to come to the conclusion that it cannot use. That sounds like
device knowledge internal to a driver, not how you describe an instance
of a device to an OS.

Have I misunderstood something here?

> 
> DT does have to describe the hardware capability - that was it's entire 
> intent. operating points are valid configurations where it can be 
> operated at - and when you have options of configurations you need to 
> choose from based on the board you are using it on, it still retains 
> "hardware behavior" aspect.

The dt should describe the particular board you're running on. As I see
it what you're suggesting is equivalent to describing some hardware in
the dt that isn't actually present, then relying on the OS to poke
around somewhere else, figure out that the hardware isn't present, and
then forget that the dt described it.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon July 31, 2013, 4:27 p.m. UTC | #12
On 07/31/2013 11:11 AM, Mark Rutland wrote:
> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>>>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>>>>>>
>>>>>>>> If more than one similar devices share the same OPPs, currently we
>>>>>>>> need to replicate the OPP entries in all the nodes.
>>>>>>>>
>>>>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>>>>>>> OPPs and only that node is referred irrespective of the logical cpu
>>>>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>>>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>>>>>>> of the node with which the current node shares the operating points.
>>>>>>>>
>>>>>>>> This patch adds support to specify the phandle in the operating points
>>>>>>>> of any device node, where the node specified by the phandle holds the
>>>>>>>> actual OPPs.
>>>>>>>
>>>>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>>>>>>>
>>>>>>>> +Optional properties:
>>>>>>>> +- operating-points-phandle: phandle to the device node with which this
>>>>>>>
>>>>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>>>>>>>
>>>>>>> I haven't thought at all about whether this change conceptually makes sense.
>>>>>>>
>>>>>>
>>>>>> They may not really be shared- we could have phandle list even. one
>>>>>> might have optional OPP sets for a chip family that one may  - I was
>>>>>> about to suggest something similar to pinctrl
>>>>>>
>>>>> I am not sure if I follow you here, if each chip family has its unique
>>>>> set of OPPs, why do we need to represent all of them together ?
>>>>> IIUC you are thinking about having these in include dts file, used by
>>>>> multiple chip/board dts.
>>>>>
>>>>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
>>>>>> operating-points-0 = <&...>
>>>>>> operating-points-1 = <&...>
>>>>>> operating-points-2 = <&...>
>>>>>>
>>>>> This looks more like a PM policy.
>>>>
>>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
>>>> trivial :)..
>>>>
>>>> An legacy example[1][2]
>>>>
>>>> SoC DM explains that the chip is capable of X opps:
>>>> opp1, 2 - for all devices
>>>> opp1,2, 3 - if efuse bit X@y is set
>>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
>>>> requirements (including additional features A, B is enabled).
>>>>
>>>> So, the same chip family has a hardware feature - not just as a pm
>>>> policy of selecting among a set of OPPs which opp to work on, but the
>>>> actual set of OPPs are actually options in themselves that is selected
>>>> based on board's SoC selection.
>>>
>>> This sounds like we're describing a set of features not applicable to
>>> the device, then removing them, rather than only describing those
>>> features applicable to the device. If you have to probe to figure out
>>> which values in the dt are applicable, I'm not sure I see the benefit of
>>> describing said values in dt.
>>
>> Device has *options* of operating points sets it can operate at. It is
>> not like "these are not applicable" for the device.
>
> I don't follow.
>
> In the example above, if efuse bit X@y is not set, opp3 is not
> applicable, but we're describing it in dt. It's not an option for the
> particular device, yet it appears in the device's dt.
This one is easy - opp_enable/disable as discussed in 
http://marc.info/?l=linux-pm&m=137528631125365&w=2 should probably help.
>
> For opp4, it's even worse, as you're suggesting we describe an option
> for the device that requires the driver to use some additional platform
> knowledge to come to the conclusion that it cannot use. That sounds like

Precisely.. it wont have that knowledge and should not need that 
knowledge. See explanation above.

Specific examples: SoC vendors try to squeeze the max out of the chip, 
when voltage values are defined, they need to consider board markets 
that they try to address, pricepoints etc.. too many vectors.. not all 
board manufacturers care to meet SoC vendor requirements as they may not 
care about picking up the full potential of the chip - example - 
usecases on OMAP where ARM is seldom used and max DSP is used (video 
usecases) and others so they use a high performance chip, refuse to 
optimize vdd_mpu rail, dont care too much about higher ARM OPPs. Yeah, I 
could always tell them to hand edit the OPP entries and maintain kernel 
forks, but that is never the right thing to do.


> device knowledge internal to a driver, not how you describe an instance
> of a device to an OS.

OPP has never been a device - it is a performance point at which to 
operate a device. I am not sure if we are discussing about phandle 
definition of OPP is an issue or options of operating-point sets is an 
issue now.

>
> Have I misunderstood something here?

Are you suggesting we have OPP tables per board?

>
>>
>> DT does have to describe the hardware capability - that was it's entire
>> intent. operating points are valid configurations where it can be
>> operated at - and when you have options of configurations you need to
>> choose from based on the board you are using it on, it still retains
>> "hardware behavior" aspect.
>
> The dt should describe the particular board you're running on. As I see
> it what you're suggesting is equivalent to describing some hardware in
> the dt that isn't actually present, then relying on the OS to poke
> around somewhere else, figure out that the hardware isn't present, and
> then forget that the dt described it.
I will buy that eventual dtb should contain some way to choose the OPP 
that the particular board can operate on.

SoC dtsi is what we define, this allows multiple board dts to use them. 
the moment we start defining OPPs per board, all mayhem breaks loose.

SoC dtsi provides options for the SoC to be operated upon, it is like 
saying I have 10 Uarts, but board dts chooses to enable the ones it 
uses. pinctrl we do the same. why cant we do with operating-points as well?
Sudeep KarkadaNagesha July 31, 2013, 4:40 p.m. UTC | #13
On 31/07/13 16:53, Nishanth Menon wrote:
> On 07/31/2013 10:28 AM, Sudeep KarkadaNagesha wrote:
>> On 31/07/13 15:46, Nishanth Menon wrote:
>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>>> On 30/07/13 21:48, Nishanth Menon wrote:
>> [...]
>>>>> This should setup stage for many of the work we have been trying to
>>>>> figure out on AM/OMAP and few other processors which has to depend on
>>>>> few sets of OPPs which may not be supported on various platforms.
>>>>>
>>>> I still don't get the point why you would publish some OPP in the DT
>>>> when the hardware which it describes doesn't support it.
>>>>
>>>> This may be already discussed when DT support was added to OPP library,
>>>> IMO if for some reason the firmware/boot entity disables some of the
>>>> OPPs, then it can append OPPs in DT with the state(enabled/disabled).
>>>> But this needs extension of current binding.
>>>
>>> you could also have reduced OPP set which needs to be invoked, appending
>>> wont really work if cpufreq table is built as part of probe - it kind of
>>> creates all kind of races which I would really like to avoid.
>>>
>> IIUC opp_set_availability(opp_enable/opp_disable) is designed for such
>> use-case ? Currently there are no users of this API but I see it fits
>> your use case.
>>
>> Even with multiple OPP sets listed in DT as you described, you need to
>> read those fuses and chose the right set of OPPs. Instead you can use
>> opp_en(/dis)able methods to do that ?
>>
> yes when the efuse data is present, but look at the other case I had 
> also pointed at:
> 
> Lets take an example: SoC X has OPPs 1,2,3,4
> 
> Same SoC is used on Board A and B.
> Board A meets with all SoC vendor requirements for routing, IR drop 
> limits etc
> Board B *does not* meet with all SoC vendor requirements for routing, IR 
> drop limits etc
> 
> we no longer have board files, board will have to have a mechanism to 
> "state it is not optimal configuration".
> 
But we still have DTS per board(I assume each board will at-least
different in IRQ/GPIO assignments or even different pin-mux configuration)

> A real life example is BeagleBoard Xm and another product board(which I 
> cannot mention) -both use OMAP3630 1GHz part. 1GHz requirements are met 
> on BeagleBoard Xm, but on the product board it is not. Chip used is 
> exactly the same, we dont have "dts property" to mention "yes, this 
> board meets SoC data manual and associated documentation requirement" - 
> instead what we do have is what is the chip capable of doing.
> 
If SoC gives configuration w.r.t OPPs, then its board property like
pin-mux. Why is it not possible to have 1GHz only in BeagleBoard Xm and
not in product board X ?

> opp_enable/disable wont work here unless there is board specific 
> "properties" we introduce. However, board file could choose "low 
> performance" option of OPPs.
> 
Again board file choice of selecting OPPs is policy and DT must describe
all the features board supports.

> the opp_enable/disable wont scale there. Further, opp_add is done 
> enmasse by cpufreq-cpu0 and and cpufreq table is built off it, there is 
> no option of SoC specific modification to the table (opportunity to do 
> opp_enable/disable) there - not something that cannot be fixed, and 
> eventually will be, but not there right now.
> 
Freescale iMX6 seems to be using it, not use if its SoC level or board
level.(imx6q_opp_check_1p2ghz in arch/arm/mach-imx/mach-imx6q.c)


I just had a look @arch/arm/boot/dts/omap36xx.dtsi and
omap3-beagle-xm.dts. This is my opinion(if you can't handle this
dynamically):

Now you have omap36xx SoC on 2 products X and BeagleBoard Xm.
omap36xx can support up to 1GHz but depends on actual products.
So its better not to publish OPPs in omap36xx.dtsi but leave to product
X and BeagleBoard Xm to describe what that hardware supports.

Regards,
Sudeep




--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon July 31, 2013, 7:13 p.m. UTC | #14
On 17:40-20130731, Sudeep KarkadaNagesha wrote:
> On 31/07/13 16:53, Nishanth Menon wrote:
> > On 07/31/2013 10:28 AM, Sudeep KarkadaNagesha wrote:
> >> On 31/07/13 15:46, Nishanth Menon wrote:
> >>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> >>>> On 30/07/13 21:48, Nishanth Menon wrote:
> >> [...]
> >>>>> This should setup stage for many of the work we have been trying to
> >>>>> figure out on AM/OMAP and few other processors which has to depend on
> >>>>> few sets of OPPs which may not be supported on various platforms.
> >>>>>
> >>>> I still don't get the point why you would publish some OPP in the DT
> >>>> when the hardware which it describes doesn't support it.
> >>>>
> >>>> This may be already discussed when DT support was added to OPP library,
> >>>> IMO if for some reason the firmware/boot entity disables some of the
> >>>> OPPs, then it can append OPPs in DT with the state(enabled/disabled).
> >>>> But this needs extension of current binding.
> >>>
> >>> you could also have reduced OPP set which needs to be invoked, appending
> >>> wont really work if cpufreq table is built as part of probe - it kind of
> >>> creates all kind of races which I would really like to avoid.
> >>>
> >> IIUC opp_set_availability(opp_enable/opp_disable) is designed for such
> >> use-case ? Currently there are no users of this API but I see it fits
> >> your use case.
> >>
> >> Even with multiple OPP sets listed in DT as you described, you need to
> >> read those fuses and chose the right set of OPPs. Instead you can use
> >> opp_en(/dis)able methods to do that ?
> >>
> > yes when the efuse data is present, but look at the other case I had 
> > also pointed at:
> > 
> > Lets take an example: SoC X has OPPs 1,2,3,4
> > 
> > Same SoC is used on Board A and B.
> > Board A meets with all SoC vendor requirements for routing, IR drop 
> > limits etc
> > Board B *does not* meet with all SoC vendor requirements for routing, IR 
> > drop limits etc
> > 
> > we no longer have board files, board will have to have a mechanism to 
> > "state it is not optimal configuration".
> > 
> But we still have DTS per board(I assume each board will at-least
> different in IRQ/GPIO assignments or even different pin-mux configuration)

yes, of course.

> 
> > A real life example is BeagleBoard Xm and another product board(which I 
> > cannot mention) -both use OMAP3630 1GHz part. 1GHz requirements are met 
> > on BeagleBoard Xm, but on the product board it is not. Chip used is 
> > exactly the same, we dont have "dts property" to mention "yes, this 
> > board meets SoC data manual and associated documentation requirement" - 
> > instead what we do have is what is the chip capable of doing.
> > 
> If SoC gives configuration w.r.t OPPs, then its board property like
> pin-mux. Why is it not possible to have 1GHz only in BeagleBoard Xm and
> not in product board X ?
Unless we have two "phandles", we wont be able to do the same. Then
you'd want to standardize how we do that which is why I made the
proposal.

> 
> > opp_enable/disable wont work here unless there is board specific 
> > "properties" we introduce. However, board file could choose "low 
> > performance" option of OPPs.
> > 
> Again board file choice of selecting OPPs is policy and DT must describe
> all the features board supports.

Umm... I would probably call it "capability" based on selection of SoC
type, board design etc. as against "policy" of making a decision -
example, this device is too hot, so lets reduce the number of OPPs that
cpufreq chooses to operate on.

> 
> > the opp_enable/disable wont scale there. Further, opp_add is done 
> > enmasse by cpufreq-cpu0 and and cpufreq table is built off it, there is 
> > no option of SoC specific modification to the table (opportunity to do 
> > opp_enable/disable) there - not something that cannot be fixed, and 
> > eventually will be, but not there right now.
> > 
> Freescale iMX6 seems to be using it, not use if its SoC level or board
> level.(imx6q_opp_check_1p2ghz in arch/arm/mach-imx/mach-imx6q.c)

Yep, this is similar to the "efuse case" I had stated, and  given no
other option, that is the only way I will probably have to go
w.r.t OMAP, but given OMAP, iMX have similar challenges, I am hoping we
can have a definition that is generic enough for various SoCs to use.
(similar to beagle_opp_init in arch/arm/mach-omap2/board-omap3beagle.c -
which is pre-dts days usage)

> 
> 
> I just had a look @arch/arm/boot/dts/omap36xx.dtsi and
> omap3-beagle-xm.dts. This is my opinion(if you can't handle this
> dynamically):
> 
> Now you have omap36xx SoC on 2 products X and BeagleBoard Xm.
> omap36xx can support up to 1GHz but depends on actual products.
> So its better not to publish OPPs in omap36xx.dtsi but leave to product
> X and BeagleBoard Xm to describe what that hardware supports.
That is the pain I was trying to explain, having configuration options
defined in SoC dtsi providing all valid options available is more
sensible. It also solves the issue of multiple devices/clocks using
same definitions

Speaking as an SoC vendor, allowing board files define their own opp
tables is basically an invitation for disaster from a production risk
perspective, as, there are too many board variants out there and not all
developers are, umm... "OPP-aware" and end up with a maintenance burden
for rest of us (e.g. board X, board Y defining 1GHz differently) etc.
Nishanth Menon July 31, 2013, 7:55 p.m. UTC | #15
On 07/31/2013 02:13 PM, Nishanth Menon wrote:
[...]

> Unless we have two "phandles", we wont be able to do the same. Then
> you'd want to standardize how we do that which is why I made the
> proposal.
>

Let me try a slightly detailed proposal of what I am trying to suggest:

Usage option #1:
Legacy support.

	cpu@0 {
		compatible = "arm,cortex-a9";
		reg = <0>;
		next-level-cache = <&L2>;
		operating-points = <
			/* kHz    uV */
			792000  1100000
			396000  950000
			198000  850000
		>;
	};

Usage option #2:
Maintain only deltas in options from a base.

	cpu@0 {
		compatible = "arm,cortex-a9";
		reg = <0>;
		next-level-cache = <&L2>;
		operating-points-names = "base", "high-performance";
		operating-points-0 = <
			/* kHz    uV */
			792000  1100000
			396000  950000
			198000  850000
		>;
		operating-points-1 = <
			/* kHz    uV */
			1000000 1200000
		>;

	};

Usage option #3: (not compatible definition to #2)
	cpu@0 {
		compatible = "arm,cortex-a9";
		reg = <0>;
		next-level-cache = <&L2>;
		operating-points-names = "default", "high-performance";
		operating-points-0 = <
			/* kHz    uV */
			792000  1100000
			396000  950000
			198000  850000
		>;
		operating-points-1 = <
			/* kHz    uV */
			1000000 1200000
			792000  1100000
			396000  950000
			198000  850000
		>;

	};

Usage option #4 (along with option 3 or 2):
	cpu@1 {
		compatible = "arm,cortex-a9";
		reg = <0>;
		next-level-cache = <&L2>;
		operating-points-device = <&cpu0 high-performance>;
	};

Usage option #5 (along with option 1):
This is the step we are attempting to do in this patch as far as I 
understand.

	cpu@1 {
		compatible = "arm,cortex-a9";
		reg = <0>;
		next-level-cache = <&L2>;
		operating-points-device = <&cpu0>;
	};


board file override option:

&cpu0 {
	operating-points-select = "default";
}

This will prevent selection of high-performance even if efuse is set 
etc.. or force selection of high-performance independent of what efuse says.

This allows us:
a) To maintain dts in a separate repository without being dependent on 
frequencies in kernel code for opp_enable/disable.
b) reasonably proceed towards complete SoC entitlement
c) not have to deal with multiple OPP definitions per board file.


Does that make sense? or do we see concerns?
Stephen Warren July 31, 2013, 9:51 p.m. UTC | #16
On 07/31/2013 08:46 AM, Nishanth Menon wrote:
...
> Let me try to explain since SoCs such as OMAP/AM family dont make life
> trivial :)..
> 
> An legacy example[1][2]
> 
> SoC DM explains that the chip is capable of X opps:
> opp1, 2 - for all devices
> opp1,2, 3 - if efuse bit X@y is set
> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
> requirements (including additional features A, B is enabled).

Hopefully the text "board design meets SoC vendors requirements" means
something like "the board has a big fan capable of dissipating a lot of
heat" and not "the board manufacturer paid us a lot of money to license
the 'go faster' feature". The former could well be suitable to represent
in DT, the latter not.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren July 31, 2013, 9:59 p.m. UTC | #17
On 07/31/2013 10:11 AM, Mark Rutland wrote:
> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
...
>>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
>>>> trivial :)..
>>>>
>>>> An legacy example[1][2]
>>>>
>>>> SoC DM explains that the chip is capable of X opps:
>>>> opp1, 2 - for all devices
>>>> opp1,2, 3 - if efuse bit X@y is set
>>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
>>>> requirements (including additional features A, B is enabled).
>>>>
>>>> So, the same chip family has a hardware feature - not just as a pm
>>>> policy of selecting among a set of OPPs which opp to work on, but the
>>>> actual set of OPPs are actually options in themselves that is selected
>>>> based on board's SoC selection.
>>>
>>> This sounds like we're describing a set of features not applicable to
>>> the device, then removing them, rather than only describing those
>>> features applicable to the device. If you have to probe to figure out
>>> which values in the dt are applicable, I'm not sure I see the benefit of
>>> describing said values in dt.
>>
>> Device has *options* of operating points sets it can operate at. It is 
>> not like "these are not applicable" for the device.
> 
> I don't follow.
> 
> In the example above, if efuse bit X@y is not set, opp3 is not
> applicable, but we're describing it in dt. It's not an option for the
> particular device, yet it appears in the device's dt.
> 
> For opp4, it's even worse, as you're suggesting we describe an option
> for the device that requires the driver to use some additional platform
> knowledge to come to the conclusion that it cannot use. That sounds like
> device knowledge internal to a driver, not how you describe an instance
> of a device to an OS.
> 
> Have I misunderstood something here?

From the kernel's perspective here, a simple approach would be to say
the the DT passed to the kernel must contain exactly the set of OPPs
that the kernel should use. If this requires probing HW (fuses, board
knowledge, etc.), then the bootloader/... must do that and fix up the DT
to correctly represent the board at boot time (or choose the correct DT
from a set of options in flash) or whoever flashed the board must have
simply put the correct DT onto the board.

Of course, that simply pushes back the issue onto the flashing utility
or bootloader. Perhaps pushing stuff back there isn't a good holistic
solution, since we only get a clean kernel by making life suck for
someone else, and we'd perhaps better optimize the entire SW stack.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Aug. 1, 2013, 12:15 p.m. UTC | #18
On 15:51-20130731, Stephen Warren wrote:
> On 07/31/2013 08:46 AM, Nishanth Menon wrote:
> ...
> > Let me try to explain since SoCs such as OMAP/AM family dont make life
> > trivial :)..
> > 
> > An legacy example[1][2]
> > 
> > SoC DM explains that the chip is capable of X opps:
> > opp1, 2 - for all devices
> > opp1,2, 3 - if efuse bit X@y is set
> > opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
> > requirements (including additional features A, B is enabled).
> 
> Hopefully the text "board design meets SoC vendors requirements" means
> something like "the board has a big fan capable of dissipating a lot of
> heat" and not "the board manufacturer paid us a lot of money to license
> the 'go faster' feature". The former could well be suitable to represent
> in DT, the latter not.

Nope, these are technical requirements. In my company, these
guidelines are called Power Distribution Network guideline - which
control the IRDrop within reasonable limit, running DPLLs are varied
frequencies have different current draw characteristics, such that noise
limits, cleanup capacitors etc. For boards that dont care too much about
higher frequencies, they tend to skimp a little on caps and board
routing constraints to save on BOM (Bill Of Materials) cost.

I am sure similar constraints do exist in other SoC vendors as well.
Mark Rutland Aug. 1, 2013, 1:54 p.m. UTC | #19
On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
> On 07/31/2013 11:11 AM, Mark Rutland wrote:
> > On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
> >> On 07/31/2013 10:29 AM, Mark Rutland wrote:
> >>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> >>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> >>>>> On 30/07/13 21:48, Nishanth Menon wrote:
> >>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> >>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> >>>>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
> >>>>>>>>
> >>>>>>>> If more than one similar devices share the same OPPs, currently we
> >>>>>>>> need to replicate the OPP entries in all the nodes.
> >>>>>>>>
> >>>>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
> >>>>>>>> OPPs and only that node is referred irrespective of the logical cpu
> >>>>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
> >>>>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
> >>>>>>>> of the node with which the current node shares the operating points.
> >>>>>>>>
> >>>>>>>> This patch adds support to specify the phandle in the operating points
> >>>>>>>> of any device node, where the node specified by the phandle holds the
> >>>>>>>> actual OPPs.
> >>>>>>>
> >>>>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> >>>>>>>
> >>>>>>>> +Optional properties:
> >>>>>>>> +- operating-points-phandle: phandle to the device node with which this
> >>>>>>>
> >>>>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
> >>>>>>>
> >>>>>>> I haven't thought at all about whether this change conceptually makes sense.
> >>>>>>>
> >>>>>>
> >>>>>> They may not really be shared- we could have phandle list even. one
> >>>>>> might have optional OPP sets for a chip family that one may  - I was
> >>>>>> about to suggest something similar to pinctrl
> >>>>>>
> >>>>> I am not sure if I follow you here, if each chip family has its unique
> >>>>> set of OPPs, why do we need to represent all of them together ?
> >>>>> IIUC you are thinking about having these in include dts file, used by
> >>>>> multiple chip/board dts.
> >>>>>
> >>>>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
> >>>>>> operating-points-0 = <&...>
> >>>>>> operating-points-1 = <&...>
> >>>>>> operating-points-2 = <&...>
> >>>>>>
> >>>>> This looks more like a PM policy.
> >>>>
> >>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
> >>>> trivial :)..
> >>>>
> >>>> An legacy example[1][2]
> >>>>
> >>>> SoC DM explains that the chip is capable of X opps:
> >>>> opp1, 2 - for all devices
> >>>> opp1,2, 3 - if efuse bit X@y is set
> >>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
> >>>> requirements (including additional features A, B is enabled).
> >>>>
> >>>> So, the same chip family has a hardware feature - not just as a pm
> >>>> policy of selecting among a set of OPPs which opp to work on, but the
> >>>> actual set of OPPs are actually options in themselves that is selected
> >>>> based on board's SoC selection.
> >>>
> >>> This sounds like we're describing a set of features not applicable to
> >>> the device, then removing them, rather than only describing those
> >>> features applicable to the device. If you have to probe to figure out
> >>> which values in the dt are applicable, I'm not sure I see the benefit of
> >>> describing said values in dt.
> >>
> >> Device has *options* of operating points sets it can operate at. It is
> >> not like "these are not applicable" for the device.
> >
> > I don't follow.
> >
> > In the example above, if efuse bit X@y is not set, opp3 is not
> > applicable, but we're describing it in dt. It's not an option for the
> > particular device, yet it appears in the device's dt.
> This one is easy - opp_enable/disable as discussed in 
> http://marc.info/?l=linux-pm&m=137528631125365&w=2 should probably help.

Wrong link? There's no reference to opp_enable or opp_disable...

> >
> > For opp4, it's even worse, as you're suggesting we describe an option
> > for the device that requires the driver to use some additional platform
> > knowledge to come to the conclusion that it cannot use. That sounds like
> 
> Precisely.. it wont have that knowledge and should not need that 
> knowledge. See explanation above.
> 
> Specific examples: SoC vendors try to squeeze the max out of the chip, 
> when voltage values are defined, they need to consider board markets 
> that they try to address, pricepoints etc.. too many vectors.. not all 
> board manufacturers care to meet SoC vendor requirements as they may not 
> care about picking up the full potential of the chip - example - 
> usecases on OMAP where ARM is seldom used and max DSP is used (video 
> usecases) and others so they use a high performance chip, refuse to 
> optimize vdd_mpu rail, dont care too much about higher ARM OPPs. Yeah, I 
> could always tell them to hand edit the OPP entries and maintain kernel 
> forks, but that is never the right thing to do.

Sorry, but I still don't follow.

We seem to be going over two cases, which both feel wrong to me:

* One SoC used in multiple boards, where on some boards an OPP cannot be
  used because some requirement is not met. In this case, the board's
  dts (by including the SoC's dtsi) describes something that's not
  necessarily usable, and we seem to have no way to describe in the OPP
  table that the OPP is not usable for that board.

* Performance profiles, in which you have a set of OPP tables for
  "performance, "low-power", and whatever else. This arbitrary split
  seems like a configuration decision rather than a hardware description
  unless there is some hard limit that cannot be detected (e.g. the
  processor can function at some arbitrary high speed, but becomes hot
  enough to melt something, and there's no temperature sensor to handle
  this case dynamically).

Have I've misunderstood something?

> 
> 
> > device knowledge internal to a driver, not how you describe an instance
> > of a device to an OS.
> 
> OPP has never been a device - it is a performance point at which to 
> operate a device. I am not sure if we are discussing about phandle 
> definition of OPP is an issue or options of operating-point sets is an 
> issue now.
> 
> >
> > Have I misunderstood something here?
> 
> Are you suggesting we have OPP tables per board?

Yes, for the reasons I give above. Common OPP tabless can easily be
factored into separate include files to allow for arbitrary composition.

> 
> >
> >>
> >> DT does have to describe the hardware capability - that was it's entire
> >> intent. operating points are valid configurations where it can be
> >> operated at - and when you have options of configurations you need to
> >> choose from based on the board you are using it on, it still retains
> >> "hardware behavior" aspect.
> >
> > The dt should describe the particular board you're running on. As I see
> > it what you're suggesting is equivalent to describing some hardware in
> > the dt that isn't actually present, then relying on the OS to poke
> > around somewhere else, figure out that the hardware isn't present, and
> > then forget that the dt described it.
> I will buy that eventual dtb should contain some way to choose the OPP 
> that the particular board can operate on.
> 
> SoC dtsi is what we define, this allows multiple board dts to use them. 
> the moment we start defining OPPs per board, all mayhem breaks loose.
> 
> SoC dtsi provides options for the SoC to be operated upon, it is like 
> saying I have 10 Uarts, but board dts chooses to enable the ones it 
> uses. pinctrl we do the same. why cant we do with operating-points as well?

That's a possibility if we define a standard mechanism for stating OPPs
are unusable (rather than having to probe the device to figure that
out).

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Aug. 1, 2013, 4:25 p.m. UTC | #20
On 08/01/2013 08:54 AM, Mark Rutland wrote:
> On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
>> On 07/31/2013 11:11 AM, Mark Rutland wrote:
>>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
>>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
>>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
>>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
>>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>>>>>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
>>>>>>>>>>
>>>>>>>>>> If more than one similar devices share the same OPPs, currently we
>>>>>>>>>> need to replicate the OPP entries in all the nodes.
>>>>>>>>>>
>>>>>>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
>>>>>>>>>> OPPs and only that node is referred irrespective of the logical cpu
>>>>>>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
>>>>>>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
>>>>>>>>>> of the node with which the current node shares the operating points.
>>>>>>>>>>
>>>>>>>>>> This patch adds support to specify the phandle in the operating points
>>>>>>>>>> of any device node, where the node specified by the phandle holds the
>>>>>>>>>> actual OPPs.
>>>>>>>>>
>>>>>>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
>>>>>>>>>
>>>>>>>>>> +Optional properties:
>>>>>>>>>> +- operating-points-phandle: phandle to the device node with which this
>>>>>>>>>
>>>>>>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
>>>>>>>>>
>>>>>>>>> I haven't thought at all about whether this change conceptually makes sense.
>>>>>>>>>
>>>>>>>>
>>>>>>>> They may not really be shared- we could have phandle list even. one
>>>>>>>> might have optional OPP sets for a chip family that one may  - I was
>>>>>>>> about to suggest something similar to pinctrl
>>>>>>>>
>>>>>>> I am not sure if I follow you here, if each chip family has its unique
>>>>>>> set of OPPs, why do we need to represent all of them together ?
>>>>>>> IIUC you are thinking about having these in include dts file, used by
>>>>>>> multiple chip/board dts.
>>>>>>>
>>>>>>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
>>>>>>>> operating-points-0 = <&...>
>>>>>>>> operating-points-1 = <&...>
>>>>>>>> operating-points-2 = <&...>
>>>>>>>>
>>>>>>> This looks more like a PM policy.
>>>>>>
>>>>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
>>>>>> trivial :)..
>>>>>>
>>>>>> An legacy example[1][2]
>>>>>>
>>>>>> SoC DM explains that the chip is capable of X opps:
>>>>>> opp1, 2 - for all devices
>>>>>> opp1,2, 3 - if efuse bit X@y is set
>>>>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
>>>>>> requirements (including additional features A, B is enabled).
>>>>>>
>>>>>> So, the same chip family has a hardware feature - not just as a pm
>>>>>> policy of selecting among a set of OPPs which opp to work on, but the
>>>>>> actual set of OPPs are actually options in themselves that is selected
>>>>>> based on board's SoC selection.
>>>>>
>>>>> This sounds like we're describing a set of features not applicable to
>>>>> the device, then removing them, rather than only describing those
>>>>> features applicable to the device. If you have to probe to figure out
>>>>> which values in the dt are applicable, I'm not sure I see the benefit of
>>>>> describing said values in dt.
>>>>
>>>> Device has *options* of operating points sets it can operate at. It is
>>>> not like "these are not applicable" for the device.
>>>
>>> I don't follow.
>>>
>>> In the example above, if efuse bit X@y is not set, opp3 is not
>>> applicable, but we're describing it in dt. It's not an option for the
>>> particular device, yet it appears in the device's dt.
>> This one is easy - opp_enable/disable as discussed in
>> http://marc.info/?l=linux-pm&m=137528631125365&w=2 should probably help.
>
> Wrong link? There's no reference to opp_enable or opp_disable...

Darned! sorry about that.. I think this is what I wanted to point yesterday.

http://marc.info/?l=linux-pm&m=137528603225295&w=2

>
>>>
>>> For opp4, it's even worse, as you're suggesting we describe an option
>>> for the device that requires the driver to use some additional platform
>>> knowledge to come to the conclusion that it cannot use. That sounds like
>>
>> Precisely.. it wont have that knowledge and should not need that
>> knowledge. See explanation above.
>>
>> Specific examples: SoC vendors try to squeeze the max out of the chip,
>> when voltage values are defined, they need to consider board markets
>> that they try to address, pricepoints etc.. too many vectors.. not all
>> board manufacturers care to meet SoC vendor requirements as they may not
>> care about picking up the full potential of the chip - example -
>> usecases on OMAP where ARM is seldom used and max DSP is used (video
>> usecases) and others so they use a high performance chip, refuse to
>> optimize vdd_mpu rail, dont care too much about higher ARM OPPs. Yeah, I
>> could always tell them to hand edit the OPP entries and maintain kernel
>> forks, but that is never the right thing to do.
>
> Sorry, but I still don't follow.
>
> We seem to be going over two cases, which both feel wrong to me:
>
> * One SoC used in multiple boards, where on some boards an OPP cannot be
>    used because some requirement is not met. In this case, the board's
>    dts (by including the SoC's dtsi) describes something that's not
>    necessarily usable, and we seem to have no way to describe in the OPP
>    table that the OPP is not usable for that board.

not at the moment at least - at least in the way we have described the 
OPP in dts today.

>
> * Performance profiles, in which you have a set of OPP tables for
>    "performance, "low-power", and whatever else. This arbitrary split
>    seems like a configuration decision rather than a hardware description
>    unless there is some hard limit that cannot be detected (e.g. the
>    processor can function at some arbitrary high speed, but becomes hot
>    enough to melt something, and there's no temperature sensor to handle
>    this case dynamically).

precisely -> I think I point this out in this thread:
http://marc.info/?l=devicetree&m=137535932402560&w=2

>
> Have I've misunderstood something?
>
>>
>>
>>> device knowledge internal to a driver, not how you describe an instance
>>> of a device to an OS.
>>
>> OPP has never been a device - it is a performance point at which to
>> operate a device. I am not sure if we are discussing about phandle
>> definition of OPP is an issue or options of operating-point sets is an
>> issue now.
>>
>>>
>>> Have I misunderstood something here?
>>
>> Are you suggesting we have OPP tables per board?
>
> Yes, for the reasons I give above. Common OPP tabless can easily be
> factored into separate include files to allow for arbitrary composition.

Hmm.. that could be one other way to do it..


>
>>
>>>
>>>>
>>>> DT does have to describe the hardware capability - that was it's entire
>>>> intent. operating points are valid configurations where it can be
>>>> operated at - and when you have options of configurations you need to
>>>> choose from based on the board you are using it on, it still retains
>>>> "hardware behavior" aspect.
>>>
>>> The dt should describe the particular board you're running on. As I see
>>> it what you're suggesting is equivalent to describing some hardware in
>>> the dt that isn't actually present, then relying on the OS to poke
>>> around somewhere else, figure out that the hardware isn't present, and
>>> then forget that the dt described it.
>> I will buy that eventual dtb should contain some way to choose the OPP
>> that the particular board can operate on.
>>
>> SoC dtsi is what we define, this allows multiple board dts to use them.
>> the moment we start defining OPPs per board, all mayhem breaks loose.
>>
>> SoC dtsi provides options for the SoC to be operated upon, it is like
>> saying I have 10 Uarts, but board dts chooses to enable the ones it
>> uses. pinctrl we do the same. why cant we do with operating-points as well?
>
> That's a possibility if we define a standard mechanism for stating OPPs
> are unusable (rather than having to probe the device to figure that
> out).
>

I did make a proposal here: 
http://marc.info/?l=devicetree&m=137530056230441&w=2

Do you see it making sense, If yes, I can help flesh out the idea with code.
Stephen Warren Aug. 1, 2013, 4:46 p.m. UTC | #21
On 08/01/2013 06:15 AM, Nishanth Menon wrote:
> On 15:51-20130731, Stephen Warren wrote:
>> On 07/31/2013 08:46 AM, Nishanth Menon wrote:
>> ...
>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
>>> trivial :)..
>>>
>>> An legacy example[1][2]
>>>
>>> SoC DM explains that the chip is capable of X opps:
>>> opp1, 2 - for all devices
>>> opp1,2, 3 - if efuse bit X@y is set
>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
>>> requirements (including additional features A, B is enabled).
>>
>> Hopefully the text "board design meets SoC vendors requirements" means
>> something like "the board has a big fan capable of dissipating a lot of
>> heat" and not "the board manufacturer paid us a lot of money to license
>> the 'go faster' feature". The former could well be suitable to represent
>> in DT, the latter not.
> 
> Nope, these are technical requirements. In my company, these
> guidelines are called Power Distribution Network guideline - which
> control the IRDrop within reasonable limit, running DPLLs are varied
> frequencies have different current draw characteristics, such that noise
> limits, cleanup capacitors etc. For boards that dont care too much about
> higher frequencies, they tend to skimp a little on caps and board
> routing constraints to save on BOM (Bill Of Materials) cost.
> 
> I am sure similar constraints do exist in other SoC vendors as well.

Great, sounds good then.

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren Aug. 1, 2013, 4:49 p.m. UTC | #22
On 08/01/2013 07:54 AM, Mark Rutland wrote:
...
> We seem to be going over two cases, which both feel wrong to me:
> 
> * One SoC used in multiple boards, where on some boards an OPP cannot be
>   used because some requirement is not met. In this case, the board's
>   dts (by including the SoC's dtsi) describes something that's not
>   necessarily usable, and we seem to have no way to describe in the OPP
>   table that the OPP is not usable for that board.

There are probably a lot of examples of this already. For example, for
pinctrl, people often want the SoC .dtsi file to include "pin
configuration nodes" (see
Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt) for many
common pinmux configurations in the SoC .dtsi file, so that board files
can simply refer to the already-existing nodes rather than having to
write everything from scratch. Obviously, not all common configurations
are used by every board.

...
>> Are you suggesting we have OPP tables per board?
> 
> Yes, for the reasons I give above. Common OPP tabless can easily be
> factored into separate include files to allow for arbitrary composition.

Using separate include files sounds like a reasonable idea.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Aug. 2, 2013, 1:15 p.m. UTC | #23
On Thu, Aug 01, 2013 at 05:25:06PM +0100, Nishanth Menon wrote:
> On 08/01/2013 08:54 AM, Mark Rutland wrote:
> > On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
> >> On 07/31/2013 11:11 AM, Mark Rutland wrote:
> >>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
> >>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
> >>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> >>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> >>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
> >>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> >>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> >>>>>>>>>> From: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
> >>>>>>>>>>
> >>>>>>>>>> If more than one similar devices share the same OPPs, currently we
> >>>>>>>>>> need to replicate the OPP entries in all the nodes.
> >>>>>>>>>>
> >>>>>>>>>> Few drivers like cpufreq depend on physical cpu0 node to specify the
> >>>>>>>>>> OPPs and only that node is referred irrespective of the logical cpu
> >>>>>>>>>> accessing it. Alternatively to support cpuhotplug path, few drivers
> >>>>>>>>>> parse all the cpu nodes for OPPs. Instead we can specify the phandle
> >>>>>>>>>> of the node with which the current node shares the operating points.
> >>>>>>>>>>
> >>>>>>>>>> This patch adds support to specify the phandle in the operating points
> >>>>>>>>>> of any device node, where the node specified by the phandle holds the
> >>>>>>>>>> actual OPPs.
> >>>>>>>>>
> >>>>>>>>>> diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
> >>>>>>>>>
> >>>>>>>>>> +Optional properties:
> >>>>>>>>>> +- operating-points-phandle: phandle to the device node with which this
> >>>>>>>>>
> >>>>>>>>> That's a funny name. Bikeshedding a bit, how about shared-operating-points?
> >>>>>>>>>
> >>>>>>>>> I haven't thought at all about whether this change conceptually makes sense.
> >>>>>>>>>
> >>>>>>>>
> >>>>>>>> They may not really be shared- we could have phandle list even. one
> >>>>>>>> might have optional OPP sets for a chip family that one may  - I was
> >>>>>>>> about to suggest something similar to pinctrl
> >>>>>>>>
> >>>>>>> I am not sure if I follow you here, if each chip family has its unique
> >>>>>>> set of OPPs, why do we need to represent all of them together ?
> >>>>>>> IIUC you are thinking about having these in include dts file, used by
> >>>>>>> multiple chip/board dts.
> >>>>>>>
> >>>>>>>> operating-points-names = "default", "performance", "cheapboard-config" ;)
> >>>>>>>> operating-points-0 = <&...>
> >>>>>>>> operating-points-1 = <&...>
> >>>>>>>> operating-points-2 = <&...>
> >>>>>>>>
> >>>>>>> This looks more like a PM policy.
> >>>>>>
> >>>>>> Let me try to explain since SoCs such as OMAP/AM family dont make life
> >>>>>> trivial :)..
> >>>>>>
> >>>>>> An legacy example[1][2]
> >>>>>>
> >>>>>> SoC DM explains that the chip is capable of X opps:
> >>>>>> opp1, 2 - for all devices
> >>>>>> opp1,2, 3 - if efuse bit X@y is set
> >>>>>> opp1,2,3,4 - if efuse bit X@y is set AND Board design meets SoC vendors
> >>>>>> requirements (including additional features A, B is enabled).
> >>>>>>
> >>>>>> So, the same chip family has a hardware feature - not just as a pm
> >>>>>> policy of selecting among a set of OPPs which opp to work on, but the
> >>>>>> actual set of OPPs are actually options in themselves that is selected
> >>>>>> based on board's SoC selection.
> >>>>>
> >>>>> This sounds like we're describing a set of features not applicable to
> >>>>> the device, then removing them, rather than only describing those
> >>>>> features applicable to the device. If you have to probe to figure out
> >>>>> which values in the dt are applicable, I'm not sure I see the benefit of
> >>>>> describing said values in dt.
> >>>>
> >>>> Device has *options* of operating points sets it can operate at. It is
> >>>> not like "these are not applicable" for the device.
> >>>
> >>> I don't follow.
> >>>
> >>> In the example above, if efuse bit X@y is not set, opp3 is not
> >>> applicable, but we're describing it in dt. It's not an option for the
> >>> particular device, yet it appears in the device's dt.
> >> This one is easy - opp_enable/disable as discussed in
> >> http://marc.info/?l=linux-pm&m=137528631125365&w=2 should probably help.
> >
> > Wrong link? There's no reference to opp_enable or opp_disable...
> 
> Darned! sorry about that.. I think this is what I wanted to point yesterday.
> 
> http://marc.info/?l=linux-pm&m=137528603225295&w=2

Cheers.

> 
> >
> >>>
> >>> For opp4, it's even worse, as you're suggesting we describe an option
> >>> for the device that requires the driver to use some additional platform
> >>> knowledge to come to the conclusion that it cannot use. That sounds like
> >>
> >> Precisely.. it wont have that knowledge and should not need that
> >> knowledge. See explanation above.
> >>
> >> Specific examples: SoC vendors try to squeeze the max out of the chip,
> >> when voltage values are defined, they need to consider board markets
> >> that they try to address, pricepoints etc.. too many vectors.. not all
> >> board manufacturers care to meet SoC vendor requirements as they may not
> >> care about picking up the full potential of the chip - example -
> >> usecases on OMAP where ARM is seldom used and max DSP is used (video
> >> usecases) and others so they use a high performance chip, refuse to
> >> optimize vdd_mpu rail, dont care too much about higher ARM OPPs. Yeah, I
> >> could always tell them to hand edit the OPP entries and maintain kernel
> >> forks, but that is never the right thing to do.
> >
> > Sorry, but I still don't follow.
> >
> > We seem to be going over two cases, which both feel wrong to me:
> >
> > * One SoC used in multiple boards, where on some boards an OPP cannot be
> >    used because some requirement is not met. In this case, the board's
> >    dts (by including the SoC's dtsi) describes something that's not
> >    necessarily usable, and we seem to have no way to describe in the OPP
> >    table that the OPP is not usable for that board.
> 
> not at the moment at least - at least in the way we have described the 
> OPP in dts today.

Ok.

> 
> >
> > * Performance profiles, in which you have a set of OPP tables for
> >    "performance, "low-power", and whatever else. This arbitrary split
> >    seems like a configuration decision rather than a hardware description
> >    unless there is some hard limit that cannot be detected (e.g. the
> >    processor can function at some arbitrary high speed, but becomes hot
> >    enough to melt something, and there's no temperature sensor to handle
> >    this case dynamically).
> 
> precisely -> I think I point this out in this thread:
> http://marc.info/?l=devicetree&m=137535932402560&w=2

The one thing I don't like is the arbitrary grouping into profiles, as
the division is entirely a configuration decision. The operating points
themselves are a hardware capability, and it may make sense to describe
the feasible points for a device in the dt, but I don't want to have
different profiles exported because it straddles the line of the dt
telling us how to use the hardware rather than what the hardware is, and
will come back to bite us later if we want to handle cpu frequency
decisions differently.

I also have a suspicion that this will end up having a subset of sane
values, and Linux won't necessarily be able to do any interpolation of
values without additional platform knowledge.

> 
> >
> > Have I've misunderstood something?
> >
> >>
> >>
> >>> device knowledge internal to a driver, not how you describe an instance
> >>> of a device to an OS.
> >>
> >> OPP has never been a device - it is a performance point at which to
> >> operate a device. I am not sure if we are discussing about phandle
> >> definition of OPP is an issue or options of operating-point sets is an
> >> issue now.
> >>
> >>>
> >>> Have I misunderstood something here?
> >>
> >> Are you suggesting we have OPP tables per board?
> >
> > Yes, for the reasons I give above. Common OPP tabless can easily be
> > factored into separate include files to allow for arbitrary composition.
> 
> Hmm.. that could be one other way to do it..
> 
> 
> >
> >>
> >>>
> >>>>
> >>>> DT does have to describe the hardware capability - that was it's entire
> >>>> intent. operating points are valid configurations where it can be
> >>>> operated at - and when you have options of configurations you need to
> >>>> choose from based on the board you are using it on, it still retains
> >>>> "hardware behavior" aspect.
> >>>
> >>> The dt should describe the particular board you're running on. As I see
> >>> it what you're suggesting is equivalent to describing some hardware in
> >>> the dt that isn't actually present, then relying on the OS to poke
> >>> around somewhere else, figure out that the hardware isn't present, and
> >>> then forget that the dt described it.
> >> I will buy that eventual dtb should contain some way to choose the OPP
> >> that the particular board can operate on.
> >>
> >> SoC dtsi is what we define, this allows multiple board dts to use them.
> >> the moment we start defining OPPs per board, all mayhem breaks loose.
> >>
> >> SoC dtsi provides options for the SoC to be operated upon, it is like
> >> saying I have 10 Uarts, but board dts chooses to enable the ones it
> >> uses. pinctrl we do the same. why cant we do with operating-points as well?
> >
> > That's a possibility if we define a standard mechanism for stating OPPs
> > are unusable (rather than having to probe the device to figure that
> > out).
> >
> 
> I did make a proposal here: 
> http://marc.info/?l=devicetree&m=137530056230441&w=2
> 
> Do you see it making sense, If yes, I can help flesh out the idea with code.

I can see one of the example mechanisms working for describing OPPs
being usable, but I'm still concerned about the division into profiles.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha Aug. 2, 2013, 1:43 p.m. UTC | #24
On 01/08/13 17:49, Stephen Warren wrote:
> On 08/01/2013 07:54 AM, Mark Rutland wrote:
> ...
>> We seem to be going over two cases, which both feel wrong to me:
>>
>> * One SoC used in multiple boards, where on some boards an OPP cannot be
>>   used because some requirement is not met. In this case, the board's
>>   dts (by including the SoC's dtsi) describes something that's not
>>   necessarily usable, and we seem to have no way to describe in the OPP
>>   table that the OPP is not usable for that board.
> 
> There are probably a lot of examples of this already. For example, for
> pinctrl, people often want the SoC .dtsi file to include "pin
> configuration nodes" (see
> Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt) for many
> common pinmux configurations in the SoC .dtsi file, so that board files
> can simply refer to the already-existing nodes rather than having to
> write everything from scratch. Obviously, not all common configurations
> are used by every board.
> 
> ...
Agreed, but I am not convinced with the comparison(pinmux and OPPs).
The main concern I have is that if some developer wants to experiment
with various configurations provided by SoC(e.g. I have seen some SoC
where the pinmux have multiple functions and you can chose one of them)
But that's not true with OPPs, if someone experiments with wrong OPP
profile, then it might damage the board permanently.

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Aug. 6, 2013, 1:29 p.m. UTC | #25
On 14:43-20130802, Sudeep KarkadaNagesha wrote:
> On 01/08/13 17:49, Stephen Warren wrote:
> > On 08/01/2013 07:54 AM, Mark Rutland wrote:
> > ...
> >> We seem to be going over two cases, which both feel wrong to me:
> >>
> >> * One SoC used in multiple boards, where on some boards an OPP cannot be
> >>   used because some requirement is not met. In this case, the board's
> >>   dts (by including the SoC's dtsi) describes something that's not
> >>   necessarily usable, and we seem to have no way to describe in the OPP
> >>   table that the OPP is not usable for that board.
> > 
> > There are probably a lot of examples of this already. For example, for
> > pinctrl, people often want the SoC .dtsi file to include "pin
> > configuration nodes" (see
> > Documentation/devicetree/bindings/pinctrl/pinctrl-bindings.txt) for many
> > common pinmux configurations in the SoC .dtsi file, so that board files
> > can simply refer to the already-existing nodes rather than having to
> > write everything from scratch. Obviously, not all common configurations
> > are used by every board.
> > 
> > ...
> Agreed, but I am not convinced with the comparison(pinmux and OPPs).
> The main concern I have is that if some developer wants to experiment
> with various configurations provided by SoC(e.g. I have seen some SoC
> where the pinmux have multiple functions and you can chose one of them)
> But that's not true with OPPs, if someone experiments with wrong OPP
> profile, then it might damage the board permanently.

Even today, nothing prevents folks from "adding custom OPPs" at their
own personal risk here - We have seen folks do this as part of board
files - Now, for that matter, there is nothing that prevents folks
linking the wrong LDO or setting wrong LDO voltage and damaging the
board either.

I mean, at the level where we "describe" the hardware and it's
operation, you cannot be idiot or experimenter-proof - If these were
to be considered paramount and prevent us from choosing the right
concept that is needed for as many SoCs as possible, it'd be a shame.
Nishanth Menon Aug. 6, 2013, 1:45 p.m. UTC | #26
On 14:15-20130802, Mark Rutland wrote:
> On Thu, Aug 01, 2013 at 05:25:06PM +0100, Nishanth Menon wrote:
> > On 08/01/2013 08:54 AM, Mark Rutland wrote:
> > > On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
> > >> On 07/31/2013 11:11 AM, Mark Rutland wrote:
> > >>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
> > >>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
> > >>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> > >>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> > >>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
> > >>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> > >>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
[...]
> > 
> > >
> > > * Performance profiles, in which you have a set of OPP tables for
> > >    "performance, "low-power", and whatever else. This arbitrary split
> > >    seems like a configuration decision rather than a hardware description
> > >    unless there is some hard limit that cannot be detected (e.g. the
> > >    processor can function at some arbitrary high speed, but becomes hot
> > >    enough to melt something, and there's no temperature sensor to handle
> > >    this case dynamically).
> > 
> > precisely -> I think I point this out in this thread:
> > http://marc.info/?l=devicetree&m=137535932402560&w=2
> 
> The one thing I don't like is the arbitrary grouping into profiles, as
> the division is entirely a configuration decision. The operating points
> themselves are a hardware capability, and it may make sense to describe
> the feasible points for a device in the dt, but I don't want to have
> different profiles exported because it straddles the line of the dt
> telling us how to use the hardware rather than what the hardware is, and
> will come back to bite us later if we want to handle cpu frequency
> decisions differently.

I can understand why it seems to wrongly indicate *how* to use the
hardware, rather than *what the hardware is* - Lets try it this way:
- if Bit X is set in efuse, one cannot use high performance mode
- If PDN (Power Distribution Network) guidelines are not met, one cannot
use high performance mode.

These constrain *hardware capability* you can do on that SoC+Board
combination - that is exactly what we have been struggling to describe
here. These are not *how to use hardware* profiles, but *hardware
capability* profiles whose selection is upto to the System in
discussion - example - SoC x will decide on bit based decision and
forbid Board file overrides while an SoC y family might choose another
path.. Framework and dts should not dictate policy and we dont try to
do that here.

How to use the hardware within the *capability costraints* is upto
drivers, there is no attempt to define that in my proposal.

> I also have a suspicion that this will end up having a subset of sane
> values, and Linux won't necessarily be able to do any interpolation of
> values without additional platform knowledge.

These may be SoC dependent. An fixed regulator might be acceptable on
SoC x, but not on y (as leakage angle might make it infeasible).

[...]
> > I did make a proposal here: 
> > http://marc.info/?l=devicetree&m=137530056230441&w=2
> > 
> > Do you see it making sense, If yes, I can help flesh out the idea with code.
> 
> I can see one of the example mechanisms working for describing OPPs
> being usable, but I'm still concerned about the division into profiles.
Above.

Does this make sense? I understand and share the concern about potential
misuse, but I am all open ears of how we'd like to ensure data is
completely separated out from kernel source.
Mark Rutland Aug. 7, 2013, 4:17 p.m. UTC | #27
On Tue, Aug 06, 2013 at 02:45:34PM +0100, Nishanth Menon wrote:
> On 14:15-20130802, Mark Rutland wrote:
> > On Thu, Aug 01, 2013 at 05:25:06PM +0100, Nishanth Menon wrote:
> > > On 08/01/2013 08:54 AM, Mark Rutland wrote:
> > > > On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
> > > >> On 07/31/2013 11:11 AM, Mark Rutland wrote:
> > > >>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
> > > >>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
> > > >>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
> > > >>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
> > > >>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
> > > >>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
> > > >>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
> [...]
> > > 
> > > >
> > > > * Performance profiles, in which you have a set of OPP tables for
> > > >    "performance, "low-power", and whatever else. This arbitrary split
> > > >    seems like a configuration decision rather than a hardware description
> > > >    unless there is some hard limit that cannot be detected (e.g. the
> > > >    processor can function at some arbitrary high speed, but becomes hot
> > > >    enough to melt something, and there's no temperature sensor to handle
> > > >    this case dynamically).
> > > 
> > > precisely -> I think I point this out in this thread:
> > > http://marc.info/?l=devicetree&m=137535932402560&w=2
> > 
> > The one thing I don't like is the arbitrary grouping into profiles, as
> > the division is entirely a configuration decision. The operating points
> > themselves are a hardware capability, and it may make sense to describe
> > the feasible points for a device in the dt, but I don't want to have
> > different profiles exported because it straddles the line of the dt
> > telling us how to use the hardware rather than what the hardware is, and
> > will come back to bite us later if we want to handle cpu frequency
> > decisions differently.
> 
> I can understand why it seems to wrongly indicate *how* to use the
> hardware, rather than *what the hardware is* - Lets try it this way:
> - if Bit X is set in efuse, one cannot use high performance mode
> - If PDN (Power Distribution Network) guidelines are not met, one cannot
> use high performance mode.
> 
> These constrain *hardware capability* you can do on that SoC+Board
> combination - that is exactly what we have been struggling to describe
> here. These are not *how to use hardware* profiles, but *hardware
> capability* profiles whose selection is upto to the System in
> discussion - example - SoC x will decide on bit based decision and
> forbid Board file overrides while an SoC y family might choose another
> path.. Framework and dts should not dictate policy and we dont try to
> do that here.
> 
> How to use the hardware within the *capability costraints* is upto
> drivers, there is no attempt to define that in my proposal.

I'm happy to have the OPPs, as your arguments certainly make sense. My
only concern is that if we have them grouped in some fashion in dt (e.g.
profiles), people will use this as configuration, treating the groups of
OPPs differnetly (prefering a 'performance' or 'low-power' profile). I'd
prefer that any decision on how to use the provided OPP values were done
in the kernel dynamically.

I suspect even if we remove profile names, people will attempt to read
some semantics into the groupings. For that reason, I'd prefer to have a
single OPP table for any device (though this table could be shared by
devices).

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha Aug. 20, 2013, 10 a.m. UTC | #28
On 07/08/13 17:17, Mark Rutland wrote:
> On Tue, Aug 06, 2013 at 02:45:34PM +0100, Nishanth Menon wrote:
>> On 14:15-20130802, Mark Rutland wrote:
>>> On Thu, Aug 01, 2013 at 05:25:06PM +0100, Nishanth Menon wrote:
>>>> On 08/01/2013 08:54 AM, Mark Rutland wrote:
>>>>> On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
>>>>>> On 07/31/2013 11:11 AM, Mark Rutland wrote:
>>>>>>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
>>>>>>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
>>>>>>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
>>>>>>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>>>>>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
>>>>>>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>>>>>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>> [...]
>>>>
>>>>>
>>>>> * Performance profiles, in which you have a set of OPP tables for
>>>>>    "performance, "low-power", and whatever else. This arbitrary split
>>>>>    seems like a configuration decision rather than a hardware description
>>>>>    unless there is some hard limit that cannot be detected (e.g. the
>>>>>    processor can function at some arbitrary high speed, but becomes hot
>>>>>    enough to melt something, and there's no temperature sensor to handle
>>>>>    this case dynamically).
>>>>
>>>> precisely -> I think I point this out in this thread:
>>>> http://marc.info/?l=devicetree&m=137535932402560&w=2
>>>
>>> The one thing I don't like is the arbitrary grouping into profiles, as
>>> the division is entirely a configuration decision. The operating points
>>> themselves are a hardware capability, and it may make sense to describe
>>> the feasible points for a device in the dt, but I don't want to have
>>> different profiles exported because it straddles the line of the dt
>>> telling us how to use the hardware rather than what the hardware is, and
>>> will come back to bite us later if we want to handle cpu frequency
>>> decisions differently.
>>
>> I can understand why it seems to wrongly indicate *how* to use the
>> hardware, rather than *what the hardware is* - Lets try it this way:
>> - if Bit X is set in efuse, one cannot use high performance mode
>> - If PDN (Power Distribution Network) guidelines are not met, one cannot
>> use high performance mode.
>>
>> These constrain *hardware capability* you can do on that SoC+Board
>> combination - that is exactly what we have been struggling to describe
>> here. These are not *how to use hardware* profiles, but *hardware
>> capability* profiles whose selection is upto to the System in
>> discussion - example - SoC x will decide on bit based decision and
>> forbid Board file overrides while an SoC y family might choose another
>> path.. Framework and dts should not dictate policy and we dont try to
>> do that here.
>>
>> How to use the hardware within the *capability costraints* is upto
>> drivers, there is no attempt to define that in my proposal.
> 
> I'm happy to have the OPPs, as your arguments certainly make sense. My
> only concern is that if we have them grouped in some fashion in dt (e.g.
> profiles), people will use this as configuration, treating the groups of
> OPPs differnetly (prefering a 'performance' or 'low-power' profile). I'd
> prefer that any decision on how to use the provided OPP values were done
> in the kernel dynamically.
> 
> I suspect even if we remove profile names, people will attempt to read
> some semantics into the groupings. For that reason, I'd prefer to have a
> single OPP table for any device (though this table could be shared by
> devices).
> 

Until we get more feedback and agreement on new proposal can we have
this simple amendment in this patch to the existing binding ? Since the
new proposal[1] is backward compatible(this patch adding support for
option#5 to existing option#1), we will have to add support for other
binding options in [1] later.

This is needed to support shared OPPs with simple/single OPP profile
and also to fix the broken and unused binding
@Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt

Regards,
Sudeep

[1] http://www.spinics.net/lists/cpufreq/msg06563.html


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nishanth Menon Aug. 20, 2013, 2:01 p.m. UTC | #29
On 08/20/2013 05:00 AM, Sudeep KarkadaNagesha wrote:
> On 07/08/13 17:17, Mark Rutland wrote:
>> On Tue, Aug 06, 2013 at 02:45:34PM +0100, Nishanth Menon wrote:
>>> On 14:15-20130802, Mark Rutland wrote:
>>>> On Thu, Aug 01, 2013 at 05:25:06PM +0100, Nishanth Menon wrote:
>>>>> On 08/01/2013 08:54 AM, Mark Rutland wrote:
>>>>>> On Wed, Jul 31, 2013 at 05:27:39PM +0100, Nishanth Menon wrote:
>>>>>>> On 07/31/2013 11:11 AM, Mark Rutland wrote:
>>>>>>>> On Wed, Jul 31, 2013 at 04:58:22PM +0100, Nishanth Menon wrote:
>>>>>>>>> On 07/31/2013 10:29 AM, Mark Rutland wrote:
>>>>>>>>>> On Wed, Jul 31, 2013 at 03:46:34PM +0100, Nishanth Menon wrote:
>>>>>>>>>>> On 07/31/2013 06:14 AM, Sudeep KarkadaNagesha wrote:
>>>>>>>>>>>> On 30/07/13 21:48, Nishanth Menon wrote:
>>>>>>>>>>>>> On 07/30/2013 01:34 PM, Stephen Warren wrote:
>>>>>>>>>>>>>> On 07/30/2013 12:00 PM, Sudeep KarkadaNagesha wrote:
>>> [...]
>>>>>
>>>>>>
>>>>>> * Performance profiles, in which you have a set of OPP tables for
>>>>>>    "performance, "low-power", and whatever else. This arbitrary split
>>>>>>    seems like a configuration decision rather than a hardware description
>>>>>>    unless there is some hard limit that cannot be detected (e.g. the
>>>>>>    processor can function at some arbitrary high speed, but becomes hot
>>>>>>    enough to melt something, and there's no temperature sensor to handle
>>>>>>    this case dynamically).
>>>>>
>>>>> precisely -> I think I point this out in this thread:
>>>>> http://marc.info/?l=devicetree&m=137535932402560&w=2
>>>>
>>>> The one thing I don't like is the arbitrary grouping into profiles, as
>>>> the division is entirely a configuration decision. The operating points
>>>> themselves are a hardware capability, and it may make sense to describe
>>>> the feasible points for a device in the dt, but I don't want to have
>>>> different profiles exported because it straddles the line of the dt
>>>> telling us how to use the hardware rather than what the hardware is, and
>>>> will come back to bite us later if we want to handle cpu frequency
>>>> decisions differently.
>>>
>>> I can understand why it seems to wrongly indicate *how* to use the
>>> hardware, rather than *what the hardware is* - Lets try it this way:
>>> - if Bit X is set in efuse, one cannot use high performance mode
>>> - If PDN (Power Distribution Network) guidelines are not met, one cannot
>>> use high performance mode.
>>>
>>> These constrain *hardware capability* you can do on that SoC+Board
>>> combination - that is exactly what we have been struggling to describe
>>> here. These are not *how to use hardware* profiles, but *hardware
>>> capability* profiles whose selection is upto to the System in
>>> discussion - example - SoC x will decide on bit based decision and
>>> forbid Board file overrides while an SoC y family might choose another
>>> path.. Framework and dts should not dictate policy and we dont try to
>>> do that here.
>>>
>>> How to use the hardware within the *capability costraints* is upto
>>> drivers, there is no attempt to define that in my proposal.
>>
>> I'm happy to have the OPPs, as your arguments certainly make sense. My
>> only concern is that if we have them grouped in some fashion in dt (e.g.
>> profiles), people will use this as configuration, treating the groups of
>> OPPs differnetly (prefering a 'performance' or 'low-power' profile). I'd
>> prefer that any decision on how to use the provided OPP values were done
>> in the kernel dynamically.
>>
>> I suspect even if we remove profile names, people will attempt to read
>> some semantics into the groupings. For that reason, I'd prefer to have a
>> single OPP table for any device (though this table could be shared by
>> devices).
>>
> 
> Until we get more feedback and agreement on new proposal can we have
> this simple amendment in this patch to the existing binding ? Since the
> new proposal[1] is backward compatible(this patch adding support for
> option#5 to existing option#1), we will have to add support for other
> binding options in [1] later.
> 
> This is needed to support shared OPPs with simple/single OPP profile
> and also to fix the broken and unused binding
> @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
> 
> Regards,
> Sudeep
> 
> [1] http://www.spinics.net/lists/cpufreq/msg06563.html


Could you post a non-RFC version of this series? As I had mentioned
earlier in the thread, I dont mind having this pulled in as stage 1 of
the transition to a more elaborate solution.
Sudeep KarkadaNagesha Aug. 20, 2013, 4:07 p.m. UTC | #30
On 20/08/13 15:01, Nishanth Menon wrote:
> On 08/20/2013 05:00 AM, Sudeep KarkadaNagesha wrote:
[...]
>> Until we get more feedback and agreement on new proposal can we have
>> this simple amendment in this patch to the existing binding ? Since the
>> new proposal[1] is backward compatible(this patch adding support for
>> option#5 to existing option#1), we will have to add support for other
>> binding options in [1] later.
>>
>> This is needed to support shared OPPs with simple/single OPP profile
>> and also to fix the broken and unused binding
>> @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
>>
>> Regards,
>> Sudeep
>>
>> [1] http://www.spinics.net/lists/cpufreq/msg06563.html
> 
> 
> Could you post a non-RFC version of this series? As I had mentioned
> earlier in the thread, I dont mind having this pulled in as stage 1 of
> the transition to a more elaborate solution.
> 
Thanks Nishanth. I would wait till Mark and Stephen agree for this approach.

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Stephen Warren Aug. 21, 2013, 10:48 p.m. UTC | #31
On 08/20/2013 04:00 AM, Sudeep KarkadaNagesha wrote:
...
> Until we get more feedback and agreement on new proposal can we have
> this simple amendment in this patch to the existing binding ? Since the
> new proposal[1] is backward compatible(this patch adding support for
> option#5 to existing option#1), we will have to add support for other
> binding options in [1] later.
> 
> This is needed to support shared OPPs with simple/single OPP profile
> and also to fix the broken and unused binding
> @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
> 
> Regards,
> Sudeep
> 
> [1] http://www.spinics.net/lists/cpufreq/msg06563.html

Presumably the desire for cpu1's node to say "go look at cpu0's node for
OPP" is because they share OPPs. Don't they share OPPs because they are
parts of the same device - that device being the CPU complex. As such,
why not define the OPPs in /cpus rather than in each of /cpus/cpuN?

Of course, that doesn't help if there are separate CPU and GPU nodes
that just happen to have the same set of OPPs and you want to share them
to save DT space. Is that at all likely?

I'd suggest/bike-shed that operating-points-device is not the correct
property name; it somehow implies that the other device actively defines
the OPPs for this device, rather than just happening to have the same
OPPs. Perhaps "operating-points-identical-to"?
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Aug. 22, 2013, 11:59 a.m. UTC | #32
On Wed, Aug 21, 2013 at 11:48:12PM +0100, Stephen Warren wrote:
> On 08/20/2013 04:00 AM, Sudeep KarkadaNagesha wrote:
> ...
> > Until we get more feedback and agreement on new proposal can we have
> > this simple amendment in this patch to the existing binding ? Since the
> > new proposal[1] is backward compatible(this patch adding support for
> > option#5 to existing option#1), we will have to add support for other
> > binding options in [1] later.
> > 
> > This is needed to support shared OPPs with simple/single OPP profile
> > and also to fix the broken and unused binding
> > @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
> > 
> > Regards,
> > Sudeep
> > 
> > [1] http://www.spinics.net/lists/cpufreq/msg06563.html
> 
> Presumably the desire for cpu1's node to say "go look at cpu0's node for
> OPP" is because they share OPPs. Don't they share OPPs because they are
> parts of the same device - that device being the CPU complex. As such,
> why not define the OPPs in /cpus rather than in each of /cpus/cpuN?

I'd very much like for it to be possible to factor out common properties
into the /cpus node, but it should follow the ePAPR recommendation fo
being treated as a fallback if not present in a particular /cpus/cpu@N
node -- that way we can handle clusters with differing OPPs. The
property might just be a phandle to a table node, but it should be
possible to make it common.

> 
> Of course, that doesn't help if there are separate CPU and GPU nodes
> that just happen to have the same set of OPPs and you want to share them
> to save DT space. Is that at all likely?

I suspect that the OPPs for CPUs and GPUs are likely to be quite
distict, and they are logically separate regardless. I'm not averse to
sharing of tables if we can handle them in a standard fashion.

> 
> I'd suggest/bike-shed that operating-points-device is not the correct
> property name; it somehow implies that the other device actively defines
> the OPPs for this device, rather than just happening to have the same
> OPPs. Perhaps "operating-points-identical-to"?
> 

I'd rather not have properties that point elsewhere and say "treat me
the same as this node". I'd rather we have common properties as
described above.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha Aug. 22, 2013, 3:32 p.m. UTC | #33
On 22/08/13 12:59, Mark Rutland wrote:
> On Wed, Aug 21, 2013 at 11:48:12PM +0100, Stephen Warren wrote:
>> On 08/20/2013 04:00 AM, Sudeep KarkadaNagesha wrote:
>> ...
>>> Until we get more feedback and agreement on new proposal can we have
>>> this simple amendment in this patch to the existing binding ? Since the
>>> new proposal[1] is backward compatible(this patch adding support for
>>> option#5 to existing option#1), we will have to add support for other
>>> binding options in [1] later.
>>>
>>> This is needed to support shared OPPs with simple/single OPP profile
>>> and also to fix the broken and unused binding
>>> @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
>>>
>>> Regards,
>>> Sudeep
>>>
>>> [1] http://www.spinics.net/lists/cpufreq/msg06563.html
>>
>> Presumably the desire for cpu1's node to say "go look at cpu0's node for
>> OPP" is because they share OPPs. Don't they share OPPs because they are
>> parts of the same device - that device being the CPU complex. As such,
>> why not define the OPPs in /cpus rather than in each of /cpus/cpuN?
> 
> I'd very much like for it to be possible to factor out common properties
> into the /cpus node, but it should follow the ePAPR recommendation fo
> being treated as a fallback if not present in a particular /cpus/cpu@N
> node -- that way we can handle clusters with differing OPPs. The
> property might just be a phandle to a table node, but it should be
> possible to make it common.
> 
Yes we can have this fallback mechanism, but only from cpu devices(OPP
library handles non-cpu devices too).

Referring the table node, I have a generic question on DT nodes.
Does each DT node have to represent a unique device ? If so having a
property common to one/more devices in a separate node doesn't sound
correct.

>>
>> Of course, that doesn't help if there are separate CPU and GPU nodes
>> that just happen to have the same set of OPPs and you want to share them
>> to save DT space. Is that at all likely?
> 
> I suspect that the OPPs for CPUs and GPUs are likely to be quite
> distict, and they are logically separate regardless. I'm not averse to
> sharing of tables if we can handle them in a standard fashion.
> 
IMO sharing OPPs just for saving DT space might lead to confusions(no
strong opinion though).

>>
>> I'd suggest/bike-shed that operating-points-device is not the correct
>> property name; it somehow implies that the other device actively defines
>> the OPPs for this device, rather than just happening to have the same
>> OPPs. Perhaps "operating-points-identical-to"?
>>
> 
> I'd rather not have properties that point elsewhere and say "treat me
> the same as this node". I'd rather we have common properties as
> described above.
> 
Agreed, but for platforms with multiple CPU clusters, since we have only
one /cpus node, we ned to have table node which is arguable if node has
represent a device(as mentioned above)

Regards,
Sudeep



--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Aug. 22, 2013, 3:50 p.m. UTC | #34
On Thu, Aug 22, 2013 at 04:32:10PM +0100, Sudeep KarkadaNagesha wrote:
> On 22/08/13 12:59, Mark Rutland wrote:
> > On Wed, Aug 21, 2013 at 11:48:12PM +0100, Stephen Warren wrote:
> >> On 08/20/2013 04:00 AM, Sudeep KarkadaNagesha wrote:
> >> ...
> >>> Until we get more feedback and agreement on new proposal can we have
> >>> this simple amendment in this patch to the existing binding ? Since the
> >>> new proposal[1] is backward compatible(this patch adding support for
> >>> option#5 to existing option#1), we will have to add support for other
> >>> binding options in [1] later.
> >>>
> >>> This is needed to support shared OPPs with simple/single OPP profile
> >>> and also to fix the broken and unused binding
> >>> @Documentation/devicetree/bindings/cpufreq/arm_big_little_dt.txt
> >>>
> >>> Regards,
> >>> Sudeep
> >>>
> >>> [1] http://www.spinics.net/lists/cpufreq/msg06563.html
> >>
> >> Presumably the desire for cpu1's node to say "go look at cpu0's node for
> >> OPP" is because they share OPPs. Don't they share OPPs because they are
> >> parts of the same device - that device being the CPU complex. As such,
> >> why not define the OPPs in /cpus rather than in each of /cpus/cpuN?
> > 
> > I'd very much like for it to be possible to factor out common properties
> > into the /cpus node, but it should follow the ePAPR recommendation fo
> > being treated as a fallback if not present in a particular /cpus/cpu@N
> > node -- that way we can handle clusters with differing OPPs. The
> > property might just be a phandle to a table node, but it should be
> > possible to make it common.
> > 
> Yes we can have this fallback mechanism, but only from cpu devices(OPP
> library handles non-cpu devices too).

Ok.

> 
> Referring the table node, I have a generic question on DT nodes.
> Does each DT node have to represent a unique device ? If so having a
> property common to one/more devices in a separate node doesn't sound
> correct.

I believe the answer is almost always yes. There are devices with
subnodes that represent some logical portion of the device, but I
couldn't think of any free-standing nodes that aren't a device or unique
firmware interface.

> 
> >>
> >> Of course, that doesn't help if there are separate CPU and GPU nodes
> >> that just happen to have the same set of OPPs and you want to share them
> >> to save DT space. Is that at all likely?
> > 
> > I suspect that the OPPs for CPUs and GPUs are likely to be quite
> > distict, and they are logically separate regardless. I'm not averse to
> > sharing of tables if we can handle them in a standard fashion.
> > 
> IMO sharing OPPs just for saving DT space might lead to confusions(no
> strong opinion though).

Agreed.

> 
> >>
> >> I'd suggest/bike-shed that operating-points-device is not the correct
> >> property name; it somehow implies that the other device actively defines
> >> the OPPs for this device, rather than just happening to have the same
> >> OPPs. Perhaps "operating-points-identical-to"?
> >>
> > 
> > I'd rather not have properties that point elsewhere and say "treat me
> > the same as this node". I'd rather we have common properties as
> > described above.
> > 
> Agreed, but for platforms with multiple CPU clusters, since we have only
> one /cpus node, we ned to have table node which is arguable if node has
> represent a device(as mentioned above)

I agree that this seems wasteful of space, but I really don't think that
pointing at another device you want the OPPs of is the best way of
describing the linkage, and I suspect we'll get all sorts of stupid bugs
resulting from that style of binding.

Consider the following (properties trimmed for brevity): 

cpus {
	cpu0: cpu@0 {
		operating-points-identical-to = <&cpu1>;
	};
	cpu1: cpu@1 {
		operating-points = <0 100>,
				   <23 47>,
				   <62 970>;
	}
};

If we boot a UP kernel on the above, I assume we won't read the info for
cpu1, and thus we won't get operating points info for cpu0. Worse, what
if cpu1 has status="disabled"? Does that make its OPP table invalid?
What if the bootloader drops cpu1?

I really don't like indirecting linkage to a property through an
arbitrary node, simply because it happened to have the same property. It
creates an artificial depdendency that will lead to problems.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep KarkadaNagesha Aug. 22, 2013, 4:28 p.m. UTC | #35
On 22/08/13 16:50, Mark Rutland wrote:
> On Thu, Aug 22, 2013 at 04:32:10PM +0100, Sudeep KarkadaNagesha wrote:
>> On 22/08/13 12:59, Mark Rutland wrote:
>>> On Wed, Aug 21, 2013 at 11:48:12PM +0100, Stephen Warren wrote:
[...]
>>>>
>>>> I'd suggest/bike-shed that operating-points-device is not the correct
>>>> property name; it somehow implies that the other device actively defines
>>>> the OPPs for this device, rather than just happening to have the same
>>>> OPPs. Perhaps "operating-points-identical-to"?
>>>>
>>>
>>> I'd rather not have properties that point elsewhere and say "treat me
>>> the same as this node". I'd rather we have common properties as
>>> described above.
>>>
>> Agreed, but for platforms with multiple CPU clusters, since we have only
>> one /cpus node, we ned to have table node which is arguable if node has
>> represent a device(as mentioned above)
> 
> I agree that this seems wasteful of space, but I really don't think that
> pointing at another device you want the OPPs of is the best way of
> describing the linkage, and I suspect we'll get all sorts of stupid bugs
> resulting from that style of binding.
> 
> Consider the following (properties trimmed for brevity): 
> 
> cpus {
> 	cpu0: cpu@0 {
> 		operating-points-identical-to = <&cpu1>;
> 	};
> 	cpu1: cpu@1 {
> 		operating-points = <0 100>,
> 				   <23 47>,
> 				   <62 970>;
> 	}
> };
> 
> If we boot a UP kernel on the above, I assume we won't read the info for
> cpu1, and thus we won't get operating points info for cpu0. Worse, what
> if cpu1 has status="disabled"? Does that make its OPP table invalid?
> What if the bootloader drops cpu1?
> 
Another question :), does status property dictate the validity of the
other properties in the node(not specific to above example, in general)?

But I agree that this indirect linkage is broken as boot-loaders can
drop the cpu node holding OPP.

IMO fallback method would be reasonable(and much cleaner compared to a
separate node) if it can even accommodate multiple cpu clusters.

> I really don't like indirecting linkage to a property through an
> arbitrary node, simply because it happened to have the same property. It
> creates an artificial depdendency that will lead to problems.
> 
I totally agree on this.

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Mark Rutland Aug. 23, 2013, 12:26 p.m. UTC | #36
On Thu, Aug 22, 2013 at 05:28:10PM +0100, Sudeep KarkadaNagesha wrote:
> On 22/08/13 16:50, Mark Rutland wrote:
> > On Thu, Aug 22, 2013 at 04:32:10PM +0100, Sudeep KarkadaNagesha wrote:
> >> On 22/08/13 12:59, Mark Rutland wrote:
> >>> On Wed, Aug 21, 2013 at 11:48:12PM +0100, Stephen Warren wrote:
> [...]
> >>>>
> >>>> I'd suggest/bike-shed that operating-points-device is not the correct
> >>>> property name; it somehow implies that the other device actively defines
> >>>> the OPPs for this device, rather than just happening to have the same
> >>>> OPPs. Perhaps "operating-points-identical-to"?
> >>>>
> >>>
> >>> I'd rather not have properties that point elsewhere and say "treat me
> >>> the same as this node". I'd rather we have common properties as
> >>> described above.
> >>>
> >> Agreed, but for platforms with multiple CPU clusters, since we have only
> >> one /cpus node, we ned to have table node which is arguable if node has
> >> represent a device(as mentioned above)
> > 
> > I agree that this seems wasteful of space, but I really don't think that
> > pointing at another device you want the OPPs of is the best way of
> > describing the linkage, and I suspect we'll get all sorts of stupid bugs
> > resulting from that style of binding.
> > 
> > Consider the following (properties trimmed for brevity): 
> > 
> > cpus {
> > 	cpu0: cpu@0 {
> > 		operating-points-identical-to = <&cpu1>;
> > 	};
> > 	cpu1: cpu@1 {
> > 		operating-points = <0 100>,
> > 				   <23 47>,
> > 				   <62 970>;
> > 	}
> > };
> > 
> > If we boot a UP kernel on the above, I assume we won't read the info for
> > cpu1, and thus we won't get operating points info for cpu0. Worse, what
> > if cpu1 has status="disabled"? Does that make its OPP table invalid?
> > What if the bootloader drops cpu1?
> > 
> Another question :), does status property dictate the validity of the
> other properties in the node(not specific to above example, in general)?

That depends on what the binding in question describes for a disabled
status. In general, if a node is disabled I would imagine that
attempting to derive any knowledge from it is not a good idea.

> 
> But I agree that this indirect linkage is broken as boot-loaders can
> drop the cpu node holding OPP.
> 
> IMO fallback method would be reasonable(and much cleaner compared to a
> separate node) if it can even accommodate multiple cpu clusters.

Well, it may be that for separate clusters we just have to describe the
information per-cpu...

> 
> > I really don't like indirecting linkage to a property through an
> > arbitrary node, simply because it happened to have the same property. It
> > creates an artificial depdendency that will lead to problems.
> > 
> I totally agree on this.

Cool.

Thanks,
Mark.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/Documentation/devicetree/bindings/power/opp.txt b/Documentation/devicetree/bindings/power/opp.txt
index 74499e5..f66fd65 100644
--- a/Documentation/devicetree/bindings/power/opp.txt
+++ b/Documentation/devicetree/bindings/power/opp.txt
@@ -4,22 +4,112 @@  SoCs have a standard set of tuples consisting of frequency and
 voltage pairs that the device will support per voltage domain. These
 are called Operating Performance Points or OPPs.
 
-Properties:
+Required Properties:
 - operating-points: An array of 2-tuples items, and each item consists
   of frequency and voltage like <freq-kHz vol-uV>.
 	freq: clock frequency in kHz
 	vol: voltage in microvolt
 
+Optional properties:
+- operating-points-phandle: phandle to the device node with which this
+	device shares the operating points(recommended if multiple
+	devices are in the same clock domain and share OPPs, as it
+	avoids replication of OPPs)
+
 Examples:
 
-cpu@0 {
-	compatible = "arm,cortex-a9";
-	reg = <0>;
-	next-level-cache = <&L2>;
-	operating-points = <
-		/* kHz    uV */
-		792000  1100000
-		396000  950000
-		198000  850000
-	>;
-};
+1. A uniprocessor system
+
+	cpu@0 {
+		compatible = "arm,cortex-a9";
+		reg = <0>;
+		next-level-cache = <&L2>;
+		operating-points = <
+			/* kHz    uV */
+			792000  1100000
+			396000  950000
+			198000  850000
+		>;
+	};
+
+If more than one device of same type share the same OPPs, for example
+all the CPUs on a SoC or in a single cluster on a SoC, then we need to
+avoid replicating the OPPs in all the nodes. We can specify the phandle
+of the node with which the current node shares the operating points instead.
+
+2. Consider a SMP system with 4 CPUs in the same clock domain.
+
+	cpu0: cpu@0 {
+		compatible = "arm,cortex-a9";
+		reg = <0>;
+		next-level-cache = <&L2>;
+		operating-points = <
+			/* kHz    uV */
+			792000  1100000
+			396000  950000
+			198000  850000
+		>;
+	};
+
+	cpu1: cpu@1 {
+		compatible = "arm,cortex-a9";
+		reg = <1>;
+		next-level-cache = <&L2>;
+		operating-points-phandle = <&cpu0>;
+	};
+
+	cpu2: cpu@2 {
+		compatible = "arm,cortex-a9";
+		reg = <2>;
+		next-level-cache = <&L2>;
+		operating-points-phandle = <&cpu0>;
+	};
+
+	cpu3: cpu@3 {
+		compatible = "arm,cortex-a9";
+		reg = <3>;
+		next-level-cache = <&L2>;
+		operating-points-phandle = <&cpu0>;
+	};
+
+3. Consider an AMP(asymmetric multi-processor) sytem with 2 clusters of CPUs.
+   Each cluster has 2 CPUs and all the CPUs within the cluster share the clock
+   domain.
+
+	cpu0: cpu@0 {
+		compatible = "arm,cortex-a15";
+		reg = <0>;
+		operating-points = <
+			/* kHz    uV */
+			792000  1100000
+			396000  950000
+			198000  850000
+		>;
+		clock-latency = <100000>; /* 100us */
+	};
+
+	cpu1: cpu@1 {
+		compatible = "arm,cortex-a15";
+		reg = <1>;
+		clock-latency = <100000>; /* 100us */
+		operating-points-phandle = <&cpu0>;
+	};
+
+	cpu2: cpu@100 {
+		compatible = "arm,cortex-a7";
+		reg = <100>;
+		operating-points = <
+			/* kHz    uV */
+			792000  950000
+			396000  750000
+			198000  450000
+		>;
+		clock-latency = <100000>; /* 100us */
+	};
+
+	cpu3: cpu@101 {
+		compatible = "arm,cortex-a7";
+		reg = <101>;
+		operating-points-phandle = <&cpu2>;
+		clock-latency = <100000>; /* 100us */
+	};
diff --git a/drivers/base/power/opp.c b/drivers/base/power/opp.c
index c8ec186..9ac3c93 100644
--- a/drivers/base/power/opp.c
+++ b/drivers/base/power/opp.c
@@ -707,12 +707,20 @@  struct srcu_notifier_head *opp_get_notifier(struct device *dev)
 int of_init_opp_table(struct device *dev)
 {
 	const struct property *prop;
+	struct device_node *opp_node;
 	const __be32 *val;
 	int nr;
 
-	prop = of_find_property(dev->of_node, "operating-points", NULL);
-	if (!prop)
+	opp_node = of_parse_phandle(dev->of_node,
+					"operating-points-phandle", 0);
+	if (!opp_node) /* if no OPP phandle, search for OPPs in current node */
+		opp_node = dev->of_node;
+	prop = of_find_property(opp_node, "operating-points", NULL);
+	if (!prop) {
+		dev_warn(dev, "node %s missing operating-points property\n",
+							opp_node->full_name);
 		return -ENODEV;
+	}
 	if (!prop->value)
 		return -ENODATA;