diff mbox

[v4,1/1] PM / OPP: Fix get sharing cpus when hotplug is used

Message ID 20170726121151.7576-1-waldemarx.rymarkiewicz@intel.com (mailing list archive)
State Superseded, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Waldemar Rymarkiewicz July 26, 2017, 12:11 p.m. UTC
We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
exist. This can happen on platforms where not all possible CPUs are
available at start up ie. hotplugged out. Cpu device is not registered in
the system so we are not able to check struct device to set the sharing
CPUs bitmask properly.

Example (real use case):
2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
due to the fact there is no struct device for cpu1 (remains offline at
bootup).

To solve the bug, stop using device struct to check device_node. Instead
get cpu device_node directly from device tree with of_get_cpu_node().

Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
---
 drivers/base/power/opp/of.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

Comments

Waldemar Rymarkiewicz July 26, 2017, 12:11 p.m. UTC | #1
Hi,

This patch fixes problem with setting opp shared cpus bitmask on
hotplugable systems. See patch commit message for details.

CHANGLOG:

v4  Added Reviewed-by: Stephen Boyd <sboyd@codeaurora.org> + use pr_err
    instead dev_err for missing opp node phandle.

v3  Acked-by: Viresh Kumar added

v2  Abandon v1 as it was not good enough. Still check all possible CPUs as in
    original code, but don't check device struct. Instead check device_node
    directly in DT.

v1  [PATCH] PM / OPP: Don't check not plugged in CPUs to avoid error
    Avoid error by checking all present cpus instead possible cpus.

Thanks,
/Waldek


Waldemar Rymarkiewicz (1):
  PM / OPP: Fix get sharing cpus when hotplug is used

 drivers/base/power/opp/of.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)
Sudeep Holla July 26, 2017, 1:10 p.m. UTC | #2
On 26/07/17 13:11, Waldemar Rymarkiewicz wrote:
> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
> exist. This can happen on platforms where not all possible CPUs are
> available at start up ie. hotplugged out. Cpu device is not registered in
> the system so we are not able to check struct device to set the sharing
> CPUs bitmask properly.
> 
> Example (real use case):
> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
> due to the fact there is no struct device for cpu1 (remains offline at
> bootup).
> 

Interesting, will cpu1/3 ever hotplugged in to Linux ?
Just wondering if we can change the logical cpu assignment.

Please note, I am not against this patch, just got curious on logical
cpu numbering.

> To solve the bug, stop using device struct to check device_node. Instead
> get cpu device_node directly from device tree with of_get_cpu_node().
> 
> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
> ---
>  drivers/base/power/opp/of.c | 29 +++++++++++++++++------------
>  1 file changed, 17 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
> index 57eec1c..e83bc15 100644
> --- a/drivers/base/power/opp/of.c
> +++ b/drivers/base/power/opp/of.c
> @@ -248,15 +248,22 @@ void dev_pm_opp_of_remove_table(struct device *dev)
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
>  
> -/* Returns opp descriptor node for a device, caller must do of_node_put() */
> -struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
> +/* Returns opp descriptor node for a device node, caller must
> + * do of_node_put() */
> +static struct device_node *_opp_of_get_opp_desc_node(struct device_node *np)
>  {
>  	/*
>  	 * There should be only ONE phandle present in "operating-points-v2"
>  	 * property.
>  	 */
>  
> -	return of_parse_phandle(dev->of_node, "operating-points-v2", 0);
> +	return of_parse_phandle(np, "operating-points-v2", 0);
> +}
> +
> +/* Returns opp descriptor node for a device, caller must do of_node_put() */
> +struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
> +{
> +	return _opp_of_get_opp_desc_node(dev->of_node);
>  }
>  EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_opp_desc_node);
>  
> @@ -572,8 +579,7 @@ EXPORT_SYMBOL_GPL(dev_pm_opp_of_cpumask_add_table);
>  int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>  				   struct cpumask *cpumask)
>  {
> -	struct device_node *np, *tmp_np;
> -	struct device *tcpu_dev;
> +	struct device_node *np, *tmp_np, *cpu_np;
>  	int cpu, ret = 0;
>  
>  	/* Get OPP descriptor node */
> @@ -593,19 +599,18 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>  		if (cpu == cpu_dev->id)
>  			continue;
>  
> -		tcpu_dev = get_cpu_device(cpu);
> -		if (!tcpu_dev) {
> -			dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
> +		cpu_np = of_get_cpu_node(cpu, NULL);

Does this mean you get cpu_np != NULL even for cpu1/3 ? If yes, is that
OK to set the mask here ?

Can we make use of of_cpu_device_node_get instead above ? We don't want
keep parsing the device tree every time as the pointers are stashed in
device struct.
Waldemar Rymarkiewicz July 26, 2017, 1:54 p.m. UTC | #3
On 26 July 2017 at 15:10, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
> On 26/07/17 13:11, Waldemar Rymarkiewicz wrote:
>> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
>> exist. This can happen on platforms where not all possible CPUs are
>> available at start up ie. hotplugged out. Cpu device is not registered in
>> the system so we are not able to check struct device to set the sharing
>> CPUs bitmask properly.
>>
>> Example (real use case):
>> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
>> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
>> due to the fact there is no struct device for cpu1 (remains offline at
>> bootup).
>>
>
> Interesting, will cpu1/3 ever hotplugged in to Linux ?

Yes, but it's specific to the platform.

Consider a system which has short-term real time requirements. We can
put some CPUs offline for Linux and give them  for another RTOS or
bare metal code to execute.

The concept is described well here
http://wiki.prplfoundation.org/w/images/d/df/MIPS_OS_Remote_Processor_Driver_Whitepaper_1.0.9.pdf

> Just wondering if we can change the logical cpu assignment.
> Please note, I am not against this patch, just got curious on logical
> cpu numbering.
>
>> To solve the bug, stop using device struct to check device_node. Instead
>> get cpu device_node directly from device tree with of_get_cpu_node().
>>
>> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
>> ---
>>  drivers/base/power/opp/of.c | 29 +++++++++++++++++------------
>>  1 file changed, 17 insertions(+), 12 deletions(-)
>>
>> diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
>> index 57eec1c..e83bc15 100644
>> --- a/drivers/base/power/opp/of.c
>> +++ b/drivers/base/power/opp/of.c
>> +     struct device_node *np, *tmp_np, *cpu_np;
>>       int cpu, ret = 0;
>>
>>       /* Get OPP descriptor node */
>> @@ -593,19 +599,18 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>>               if (cpu == cpu_dev->id)
>>                       continue;
>>
>> -             tcpu_dev = get_cpu_device(cpu);
>> -             if (!tcpu_dev) {
>> -                     dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
>> +             cpu_np = of_get_cpu_node(cpu, NULL);
>
> Does this mean you get cpu_np != NULL even for cpu1/3 ?

cpu_np exists for all logical cpus defined in DT.  System boots with
all logical CPUs, but platform code (mips) reconfigures that later on.

> If yes, is that  OK to set the mask here ?

We do so, but before we need to check if this cpu_np shares the opp phandle.

>
> Can we make use of of_cpu_device_node_get instead above ? We don't want
> keep parsing the device tree every time as the pointers are stashed in
> device struct.

This is exactly what I want to change, not to use 'device struct'
because for cpu1/3 I don't have these structs.

/Waldek
Sudeep Holla July 26, 2017, 2:08 p.m. UTC | #4
On 26/07/17 14:54, Waldemar Rymarkiewicz wrote:
> On 26 July 2017 at 15:10, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 26/07/17 13:11, Waldemar Rymarkiewicz wrote:
>>> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
>>> exist. This can happen on platforms where not all possible CPUs are
>>> available at start up ie. hotplugged out. Cpu device is not registered in
>>> the system so we are not able to check struct device to set the sharing
>>> CPUs bitmask properly.
>>>
>>> Example (real use case):
>>> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
>>> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
>>> due to the fact there is no struct device for cpu1 (remains offline at
>>> bootup).
>>>
>>
>> Interesting, will cpu1/3 ever hotplugged in to Linux ?
> 
> Yes, but it's specific to the platform.
> 
> Consider a system which has short-term real time requirements. We can
> put some CPUs offline for Linux and give them  for another RTOS or
> bare metal code to execute.
> 
> The concept is described well here
> http://wiki.prplfoundation.org/w/images/d/df/MIPS_OS_Remote_Processor_Driver_Whitepaper_1.0.9.pdf
> 

Indeed, I got to this paper when I saw this patch and was checking on
VPE/TC on MIPS :)

>> Just wondering if we can change the logical cpu assignment.
>> Please note, I am not against this patch, just got curious on logical
>> cpu numbering.
>>
>>> To solve the bug, stop using device struct to check device_node. Instead
>>> get cpu device_node directly from device tree with of_get_cpu_node().
>>>
>>> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
>>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>>> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
>>> ---
>>>  drivers/base/power/opp/of.c | 29 +++++++++++++++++------------
>>>  1 file changed, 17 insertions(+), 12 deletions(-)
>>>
>>> diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
>>> index 57eec1c..e83bc15 100644
>>> --- a/drivers/base/power/opp/of.c
>>> +++ b/drivers/base/power/opp/of.c
>>> +     struct device_node *np, *tmp_np, *cpu_np;
>>>       int cpu, ret = 0;
>>>
>>>       /* Get OPP descriptor node */
>>> @@ -593,19 +599,18 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>>>               if (cpu == cpu_dev->id)
>>>                       continue;
>>>
>>> -             tcpu_dev = get_cpu_device(cpu);
>>> -             if (!tcpu_dev) {
>>> -                     dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
>>> +             cpu_np = of_get_cpu_node(cpu, NULL);
>>
>> Does this mean you get cpu_np != NULL even for cpu1/3 ?
> 
> cpu_np exists for all logical cpus defined in DT.  System boots with
> all logical CPUs, but platform code (mips) reconfigures that later on.
> 

Fine, that's what I assumed but can't understand how is this case any
different from normal hotplug ?

CPUs won't be unregistered on hotplug path and hence I can't figure out
why get_cpu_device would fail if all the CPUs were registered on boot.
Just trying to understand the scenario and how it differs from normal
hotplug case.

>> If yes, is that  OK to set the mask here ?
> 
> We do so, but before we need to check if this cpu_np shares the opp phandle.
> 
>>
>> Can we make use of of_cpu_device_node_get instead above ? We don't want
>> keep parsing the device tree every time as the pointers are stashed in
>> device struct.
> 
> This is exactly what I want to change, not to use 'device struct'
> because for cpu1/3 I don't have these structs.
> 

Understood, but I am just trying to avoid that. I still want to use
of_cpu_device_node_get as I recently posted a patch[1] to deal with such
case in that function.

[1] https://patchwork.kernel.org/patch/9859453/
Waldemar Rymarkiewicz July 26, 2017, 2:41 p.m. UTC | #5
On 26 July 2017 at 16:08, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
> On 26/07/17 14:54, Waldemar Rymarkiewicz wrote:
>> On 26 July 2017 at 15:10, Sudeep Holla <sudeep.holla@arm.com> wrote:
>
>
>>>> diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
>>>> index 57eec1c..e83bc15 100644
>>>> --- a/drivers/base/power/opp/of.c
>>>> +++ b/drivers/base/power/opp/of.c
>>>> +     struct device_node *np, *tmp_np, *cpu_np;
>>>>       int cpu, ret = 0;
>>>>
>>>>       /* Get OPP descriptor node */
>>>> @@ -593,19 +599,18 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>>>>               if (cpu == cpu_dev->id)
>>>>                       continue;
>>>>
>>>> -             tcpu_dev = get_cpu_device(cpu);
>>>> -             if (!tcpu_dev) {
>>>> -                     dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
>>>> +             cpu_np = of_get_cpu_node(cpu, NULL);
>>>
>>> Does this mean you get cpu_np != NULL even for cpu1/3 ?
>>
>> cpu_np exists for all logical cpus defined in DT.  System boots with
>> all logical CPUs, but platform code (mips) reconfigures that later on.
>>
>
> Fine, that's what I assumed but can't understand how is this case any
> different from normal hotplug ?
>
> CPUs won't be unregistered on hotplug path and hence I can't figure out
> why get_cpu_device would fail if all the CPUs were registered on boot.
> Just trying to understand the scenario and how it differs from normal
> hotplug case.

Frankly, I did not analyse this so much, but I just saw that, in my
case, mips reconfigures CPUs before topology_init() is called. This
function registers cpus that are set in cpu_present_mask. Therefore I
cannot find "struct device" for all *possible* CPUs.
See [1]

>
>>> Can we make use of of_cpu_device_node_get instead above ? We don't want
>>> keep parsing the device tree every time as the pointers are stashed in
>>> device struct.
>>
>> This is exactly what I want to change, not to use 'device struct'
>> because for cpu1/3 I don't have these structs.
>>
> Understood, but I am just trying to avoid that. I still want to use
> of_cpu_device_node_get as I recently posted a patch[1] to deal with such
> case in that function.

Well, I see I do exactly what you try to avoid with your patch :)
I agree this makes really sense so I can use your
of_cpu_device_node_get once it's in the tree.

-               tcpu_dev = get_cpu_device(cpu);
-               if (!tcpu_dev) {
-                       dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
+               cpu_np = of_cpu_device_node_get(cpu);
+               if (!cpu_np) {
+                       dev_err(cpu_dev, "%s: failed to get cpu%d node\n",


[1] http://elixir.free-electrons.com/linux/latest/source/arch/mips/kernel/topology.c#L19

/Waldek
Sudeep Holla July 26, 2017, 3:20 p.m. UTC | #6
On 26/07/17 15:41, Waldemar Rymarkiewicz wrote:
> On 26 July 2017 at 16:08, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>> On 26/07/17 14:54, Waldemar Rymarkiewicz wrote:
>>> On 26 July 2017 at 15:10, Sudeep Holla <sudeep.holla@arm.com> wrote:
>>
>>
>>>>> diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
>>>>> index 57eec1c..e83bc15 100644
>>>>> --- a/drivers/base/power/opp/of.c
>>>>> +++ b/drivers/base/power/opp/of.c
>>>>> +     struct device_node *np, *tmp_np, *cpu_np;
>>>>>       int cpu, ret = 0;
>>>>>
>>>>>       /* Get OPP descriptor node */
>>>>> @@ -593,19 +599,18 @@ int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
>>>>>               if (cpu == cpu_dev->id)
>>>>>                       continue;
>>>>>
>>>>> -             tcpu_dev = get_cpu_device(cpu);
>>>>> -             if (!tcpu_dev) {
>>>>> -                     dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
>>>>> +             cpu_np = of_get_cpu_node(cpu, NULL);
>>>>
>>>> Does this mean you get cpu_np != NULL even for cpu1/3 ?
>>>
>>> cpu_np exists for all logical cpus defined in DT.  System boots with
>>> all logical CPUs, but platform code (mips) reconfigures that later on.
>>>
>>
>> Fine, that's what I assumed but can't understand how is this case any
>> different from normal hotplug ?
>>
>> CPUs won't be unregistered on hotplug path and hence I can't figure out
>> why get_cpu_device would fail if all the CPUs were registered on boot.
>> Just trying to understand the scenario and how it differs from normal
>> hotplug case.
> 
> Frankly, I did not analyse this so much, but I just saw that, in my
> case, mips reconfigures CPUs before topology_init() is called. This
> function registers cpus that are set in cpu_present_mask. Therefore I
> cannot find "struct device" for all *possible* CPUs.
> See [1]

Which means it's never marked present in the boot ?
Looks like it's removed quite early even before CPU's are registered.
So I am not sure if they are ever used on Linux, in which case it's
better to fix the logical numbering itself.

Anyways we need to use of_cpu_device_node_get once my patch is in the tree.
Rafael J. Wysocki July 27, 2017, 12:03 a.m. UTC | #7
On Wednesday, July 26, 2017 02:11:38 PM Waldemar Rymarkiewicz wrote:
> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
> exist. This can happen on platforms where not all possible CPUs are
> available at start up ie. hotplugged out. Cpu device is not registered in
> the system so we are not able to check struct device to set the sharing
> CPUs bitmask properly.
> 
> Example (real use case):
> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
> due to the fact there is no struct device for cpu1 (remains offline at
> bootup).
> 
> To solve the bug, stop using device struct to check device_node. Instead
> get cpu device_node directly from device tree with of_get_cpu_node().
> 
> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>

Please fix the build issue reported by 0-day introduced by this patch.

Thanks,
Rafael
Viresh Kumar July 27, 2017, 3:43 a.m. UTC | #8
On 26-07-17, 15:08, Sudeep Holla wrote:
> CPUs won't be unregistered on hotplug path

You sure? I don't you will get confused with this as you already know
all this, but I will still write it down for the sake of clarity.

We aren't talking about offlining a CPU here, but a real physical
hotplug. Why should the kernel keep a device structure for a device if
it isn't physically present on the system?

acpi_processor_remove() removes the CPU device and I thought it was
all related to that hotplug.

Am I missing something ?

> and hence I can't figure out
> why get_cpu_device would fail if all the CPUs were registered on boot.
> Just trying to understand the scenario and how it differs from normal
> hotplug case.
Waldemar Rymarkiewicz July 27, 2017, 7:25 a.m. UTC | #9
On 27 July 2017 at 02:03, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, July 26, 2017 02:11:38 PM Waldemar Rymarkiewicz wrote:
>> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
>> exist. This can happen on platforms where not all possible CPUs are
>> available at start up ie. hotplugged out. Cpu device is not registered in
>> the system so we are not able to check struct device to set the sharing
>> CPUs bitmask properly.
>>
>> Example (real use case):
>> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
>> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
>> due to the fact there is no struct device for cpu1 (remains offline at
>> bootup).
>>
>> To solve the bug, stop using device struct to check device_node. Instead
>> get cpu device_node directly from device tree with of_get_cpu_node().
>>
>> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
>> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
>
> Please fix the build issue reported by 0-day introduced by this patch.

Where can I find the report? I did not receive anything.

Thanks,
/Waldek
Viresh Kumar July 27, 2017, 7:31 a.m. UTC | #10
On 27-07-17, 09:25, Waldemar Rymarkiewicz wrote:
> On 27 July 2017 at 02:03, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> > On Wednesday, July 26, 2017 02:11:38 PM Waldemar Rymarkiewicz wrote:
> >> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
> >> exist. This can happen on platforms where not all possible CPUs are
> >> available at start up ie. hotplugged out. Cpu device is not registered in
> >> the system so we are not able to check struct device to set the sharing
> >> CPUs bitmask properly.
> >>
> >> Example (real use case):
> >> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
> >> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
> >> due to the fact there is no struct device for cpu1 (remains offline at
> >> bootup).
> >>
> >> To solve the bug, stop using device struct to check device_node. Instead
> >> get cpu device_node directly from device tree with of_get_cpu_node().
> >>
> >> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
> >> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
> >> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
> >
> > Please fix the build issue reported by 0-day introduced by this patch.
> 
> Where can I find the report? I did not receive anything.

It was sent to your email id:
Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>

Here is the link:

http://marc.info/?l=linux-pm&m=150110740706429&w=2
Waldemar Rymarkiewicz July 27, 2017, 10:01 a.m. UTC | #11
On 27 July 2017 at 02:03, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Wednesday, July 26, 2017 02:11:38 PM Waldemar Rymarkiewicz wrote:
>> We fail dev_pm_opp_of_get_sharing_cpus() when possible cpu device does not
>> exist. This can happen on platforms where not all possible CPUs are
>> available at start up ie. hotplugged out. Cpu device is not registered in
>> the system so we are not able to check struct device to set the sharing
>> CPUs bitmask properly.
>>
>> Example (real use case):
>> 2 physical MIPS cores, 4 VPE, cpu0/2 run Linux and cpu1/3 are not available
>> for Linux at boot up. cpufreq-dt driver + opp v2 fail to register opp_table
>> due to the fact there is no struct device for cpu1 (remains offline at
>> bootup).
>>
>> To solve the bug, stop using device struct to check device_node. Instead
>> get cpu device_node directly from device tree with of_get_cpu_node().
>>
>> Signed-off-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com>
>> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>

>> Reviewed-by: Stephen Boyd <sboyd@codeaurora.org>
>
> Please fix the build issue reported by 0-day introduced by this patch.

err.. alldefconfig does not enable CONFIG_OF.  I have updated my
travis file, so it builds now all{yes|no|def}config.
sending v5 in a minute.

I would prefer to take this patch as it is if no one is against and
once Sudeep's patch [1] is in the tree, I will push another one
replacing  of_get_cpu_node with of_cpu_device_node_ge().

[1] https://patchwork.kernel.org/patch/9859453/

/Waldek
kernel test robot July 30, 2017, 2:54 p.m. UTC | #12
Hi Waldemar,

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.13-rc2 next-20170728]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Waldemar-Rymarkiewicz/PM-OPP-Fix-get-sharing-cpus-when-hotplug-is-used/20170728-221141
base:   https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git linux-next
config: ia64-allyesconfig (attached as .config)
compiler: ia64-linux-gcc (GCC) 6.2.0
reproduce:
        wget https://raw.githubusercontent.com/01org/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # save the attached .config to linux build tree
        make.cross ARCH=ia64 

All errors (new ones prefixed by >>):

   In file included from include/linux/kernel.h:13:0,
                    from include/linux/list.h:8,
                    from include/linux/kobject.h:20,
                    from include/linux/device.h:17,
                    from include/linux/node.h:17,
                    from include/linux/cpu.h:16,
                    from drivers/base/power/opp/of.c:16:
   drivers/base/power/opp/of.c: In function 'dev_pm_opp_of_get_sharing_cpus':
>> drivers/base/power/opp/of.c:617:45: error: 'cpu_node' undeclared (first use in this function)
       pr_err("%pOF: Couldn't find opp node\n", cpu_node);
                                                ^
   include/linux/printk.h:301:33: note: in definition of macro 'pr_err'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                                    ^~~~~~~~~~~
   drivers/base/power/opp/of.c:617:45: note: each undeclared identifier is reported only once for each function it appears in
       pr_err("%pOF: Couldn't find opp node\n", cpu_node);
                                                ^
   include/linux/printk.h:301:33: note: in definition of macro 'pr_err'
     printk(KERN_ERR pr_fmt(fmt), ##__VA_ARGS__)
                                    ^~~~~~~~~~~

vim +/cpu_node +617 drivers/base/power/opp/of.c

   565	
   566	/*
   567	 * Works only for OPP v2 bindings.
   568	 *
   569	 * Returns -ENOENT if operating-points-v2 bindings aren't supported.
   570	 */
   571	/**
   572	 * dev_pm_opp_of_get_sharing_cpus() - Get cpumask of CPUs sharing OPPs with
   573	 *				      @cpu_dev using operating-points-v2
   574	 *				      bindings.
   575	 *
   576	 * @cpu_dev:	CPU device for which we do this operation
   577	 * @cpumask:	cpumask to update with information of sharing CPUs
   578	 *
   579	 * This updates the @cpumask with CPUs that are sharing OPPs with @cpu_dev.
   580	 *
   581	 * Returns -ENOENT if operating-points-v2 isn't present for @cpu_dev.
   582	 */
   583	int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
   584					   struct cpumask *cpumask)
   585	{
   586		struct device_node *np, *tmp_np, *cpu_np;
   587		int cpu, ret = 0;
   588	
   589		/* Get OPP descriptor node */
   590		np = dev_pm_opp_of_get_opp_desc_node(cpu_dev);
   591		if (!np) {
   592			dev_dbg(cpu_dev, "%s: Couldn't find opp node.\n", __func__);
   593			return -ENOENT;
   594		}
   595	
   596		cpumask_set_cpu(cpu_dev->id, cpumask);
   597	
   598		/* OPPs are shared ? */
   599		if (!of_property_read_bool(np, "opp-shared"))
   600			goto put_cpu_node;
   601	
   602		for_each_possible_cpu(cpu) {
   603			if (cpu == cpu_dev->id)
   604				continue;
   605	
   606			cpu_np = of_get_cpu_node(cpu, NULL);
   607			if (!cpu_np) {
   608				dev_err(cpu_dev, "%s: failed to get cpu%d node\n",
   609					__func__, cpu);
   610				ret = -ENOENT;
   611				goto put_cpu_node;
   612			}
   613	
   614			/* Get OPP descriptor node */
   615			tmp_np = _opp_of_get_opp_desc_node(cpu_np);
   616			if (!tmp_np) {
 > 617				pr_err("%pOF: Couldn't find opp node\n", cpu_node);

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation
Rafael J. Wysocki July 31, 2017, 11:14 a.m. UTC | #13
On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
> Hi Waldemar,
> 
> [auto build test ERROR on pm/linux-next]
> [also build test ERROR on v4.13-rc2 next-20170728]
> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

OK, I'm dropping this commit.

Please make sure that the problem is fixed before sending a new version of it.

Thanks,
Rafael
Waldemar Rymarkiewicz July 31, 2017, 3:12 p.m. UTC | #14
On 31 July 2017 at 13:14, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
>> Hi Waldemar,
>>
>> [auto build test ERROR on pm/linux-next]
>> [also build test ERROR on v4.13-rc2 next-20170728]
>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>
> OK, I'm dropping this commit.
>
> Please make sure that the problem is fixed before sending a new version of it.

I am a bit confused, it was already fixed and sent out in v5 [1], but
I see v5 is in 'Not Applicable, archived' state. Why?

[1] https://patchwork.kernel.org/patch/9866525/

Thanks,
/Waldek
Rafael J. Wysocki July 31, 2017, 6:50 p.m. UTC | #15
On Mon, Jul 31, 2017 at 5:12 PM, Waldemar Rymarkiewicz
<waldemar.rymarkiewicz@gmail.com> wrote:
> On 31 July 2017 at 13:14, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
>>> Hi Waldemar,
>>>
>>> [auto build test ERROR on pm/linux-next]
>>> [also build test ERROR on v4.13-rc2 next-20170728]
>>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>>
>> OK, I'm dropping this commit.
>>
>> Please make sure that the problem is fixed before sending a new version of it.
>
> I am a bit confused, it was already fixed and sent out in v5 [1], but
> I see v5 is in 'Not Applicable, archived' state. Why?
>
> [1] https://patchwork.kernel.org/patch/9866525/

Because it was applied and still broke.

Thanks,
Rafael
Viresh Kumar Aug. 1, 2017, 4:37 a.m. UTC | #16
On 31-07-17, 20:50, Rafael J. Wysocki wrote:
> On Mon, Jul 31, 2017 at 5:12 PM, Waldemar Rymarkiewicz
> <waldemar.rymarkiewicz@gmail.com> wrote:
> > On 31 July 2017 at 13:14, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
> >> On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
> >>> Hi Waldemar,
> >>>
> >>> [auto build test ERROR on pm/linux-next]
> >>> [also build test ERROR on v4.13-rc2 next-20170728]
> >>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
> >>
> >> OK, I'm dropping this commit.
> >>
> >> Please make sure that the problem is fixed before sending a new version of it.
> >
> > I am a bit confused, it was already fixed and sent out in v5 [1], but
> > I see v5 is in 'Not Applicable, archived' state. Why?
> >
> > [1] https://patchwork.kernel.org/patch/9866525/
> 
> Because it was applied and still broke.

Not actually.

The bug report *wasn't* reported for the patch you applied, but how would be the
build go if V4 is applied on top of pm/linux-next. And so buildbot replied to
this thread.

It can sometimes get a bit confusing seriously, I have to read it very carefully
today after the discussions you two had :)

You can re-apply the commit as that was doing the right thing:

commit 63a48cf920c0 ("PM / OPP: Fix get sharing CPUs when hotplug is used")

@Fengguang: We need to somehow indicate (in BOLD/CAPS?) that the report isn't
for a commit in the tree. Yeah, you would normally include the commit sha in
that case and so we can actually distinguish between the two cases. But
practically, people are getting confused unless they read it very very
carefully. Thanks for the great job btw :)
Rafael J. Wysocki Aug. 1, 2017, 11:53 a.m. UTC | #17
On Tue, Aug 1, 2017 at 6:37 AM, Viresh Kumar <viresh.kumar@linaro.org> wrote:
> On 31-07-17, 20:50, Rafael J. Wysocki wrote:
>> On Mon, Jul 31, 2017 at 5:12 PM, Waldemar Rymarkiewicz
>> <waldemar.rymarkiewicz@gmail.com> wrote:
>> > On 31 July 2017 at 13:14, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> >> On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
>> >>> Hi Waldemar,
>> >>>
>> >>> [auto build test ERROR on pm/linux-next]
>> >>> [also build test ERROR on v4.13-rc2 next-20170728]
>> >>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>> >>
>> >> OK, I'm dropping this commit.
>> >>
>> >> Please make sure that the problem is fixed before sending a new version of it.
>> >
>> > I am a bit confused, it was already fixed and sent out in v5 [1], but
>> > I see v5 is in 'Not Applicable, archived' state. Why?
>> >
>> > [1] https://patchwork.kernel.org/patch/9866525/
>>
>> Because it was applied and still broke.
>
> Not actually.
>
> The bug report *wasn't* reported for the patch you applied, but how would be the
> build go if V4 is applied on top of pm/linux-next. And so buildbot replied to
> this thread.
>
> It can sometimes get a bit confusing seriously, I have to read it very carefully
> today after the discussions you two had :)

Yeah, I overlooked that, thanks!

>
> You can re-apply the commit as that was doing the right thing:
>
> commit 63a48cf920c0 ("PM / OPP: Fix get sharing CPUs when hotplug is used")

OK

Thanks,
Rafael
kernel test robot Aug. 3, 2017, 1:50 a.m. UTC | #18
On Tue, Aug 01, 2017 at 10:07:08AM +0530, Viresh Kumar wrote:
>On 31-07-17, 20:50, Rafael J. Wysocki wrote:
>> On Mon, Jul 31, 2017 at 5:12 PM, Waldemar Rymarkiewicz
>> <waldemar.rymarkiewicz@gmail.com> wrote:
>> > On 31 July 2017 at 13:14, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> >> On Sunday, July 30, 2017 10:54:30 PM kbuild test robot wrote:
>> >>> Hi Waldemar,
>> >>>
>> >>> [auto build test ERROR on pm/linux-next]
>> >>> [also build test ERROR on v4.13-rc2 next-20170728]
>> >>> [if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
>> >>
>> >> OK, I'm dropping this commit.
>> >>
>> >> Please make sure that the problem is fixed before sending a new version of it.
>> >
>> > I am a bit confused, it was already fixed and sent out in v5 [1], but
>> > I see v5 is in 'Not Applicable, archived' state. Why?
>> >
>> > [1] https://patchwork.kernel.org/patch/9866525/
>>
>> Because it was applied and still broke.
>
>Not actually.
>
>The bug report *wasn't* reported for the patch you applied, but how would be the
>build go if V4 is applied on top of pm/linux-next. And so buildbot replied to
>this thread.
>
>It can sometimes get a bit confusing seriously, I have to read it very carefully
>today after the discussions you two had :)
>
>You can re-apply the commit as that was doing the right thing:
>
>commit 63a48cf920c0 ("PM / OPP: Fix get sharing CPUs when hotplug is used")
>
>@Fengguang: We need to somehow indicate (in BOLD/CAPS?) that the report isn't
>for a commit in the tree.

For patch testing, the report will start with such a paragraph

[auto build test ERROR on pm/linux-next]
[also build test ERROR on v4.13-rc2 next-20170728]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

which should be obvious enough. How about you take them as indicators
that it's a LKML patch instead of git tree commit? :)

>Yeah, you would normally include the commit sha in that case and so
>we can actually distinguish between the two cases. But practically,
>people are getting confused unless they read it very very carefully.
>Thanks for the great job btw :)

You are welcome!

Regards,
Fengguang
diff mbox

Patch

diff --git a/drivers/base/power/opp/of.c b/drivers/base/power/opp/of.c
index 57eec1c..e83bc15 100644
--- a/drivers/base/power/opp/of.c
+++ b/drivers/base/power/opp/of.c
@@ -248,15 +248,22 @@  void dev_pm_opp_of_remove_table(struct device *dev)
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_remove_table);
 
-/* Returns opp descriptor node for a device, caller must do of_node_put() */
-struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
+/* Returns opp descriptor node for a device node, caller must
+ * do of_node_put() */
+static struct device_node *_opp_of_get_opp_desc_node(struct device_node *np)
 {
 	/*
 	 * There should be only ONE phandle present in "operating-points-v2"
 	 * property.
 	 */
 
-	return of_parse_phandle(dev->of_node, "operating-points-v2", 0);
+	return of_parse_phandle(np, "operating-points-v2", 0);
+}
+
+/* Returns opp descriptor node for a device, caller must do of_node_put() */
+struct device_node *dev_pm_opp_of_get_opp_desc_node(struct device *dev)
+{
+	return _opp_of_get_opp_desc_node(dev->of_node);
 }
 EXPORT_SYMBOL_GPL(dev_pm_opp_of_get_opp_desc_node);
 
@@ -572,8 +579,7 @@  EXPORT_SYMBOL_GPL(dev_pm_opp_of_cpumask_add_table);
 int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
 				   struct cpumask *cpumask)
 {
-	struct device_node *np, *tmp_np;
-	struct device *tcpu_dev;
+	struct device_node *np, *tmp_np, *cpu_np;
 	int cpu, ret = 0;
 
 	/* Get OPP descriptor node */
@@ -593,19 +599,18 @@  int dev_pm_opp_of_get_sharing_cpus(struct device *cpu_dev,
 		if (cpu == cpu_dev->id)
 			continue;
 
-		tcpu_dev = get_cpu_device(cpu);
-		if (!tcpu_dev) {
-			dev_err(cpu_dev, "%s: failed to get cpu%d device\n",
+		cpu_np = of_get_cpu_node(cpu, NULL);
+		if (!cpu_np) {
+			dev_err(cpu_dev, "%s: failed to get cpu%d node\n",
 				__func__, cpu);
-			ret = -ENODEV;
+			ret = -ENOENT;
 			goto put_cpu_node;
 		}
 
 		/* Get OPP descriptor node */
-		tmp_np = dev_pm_opp_of_get_opp_desc_node(tcpu_dev);
+		tmp_np = _opp_of_get_opp_desc_node(cpu_np);
 		if (!tmp_np) {
-			dev_err(tcpu_dev, "%s: Couldn't find opp node.\n",
-				__func__);
+			pr_err("%pOF: Couldn't find opp node\n", cpu_node);
 			ret = -ENOENT;
 			goto put_cpu_node;
 		}