diff mbox

[v3,5/5] thermal: cpu_cooling: update the cpu device when cpufreq updates the policy cpu

Message ID 1425316643-31991-6-git-send-email-javi.merino@arm.com (mailing list archive)
State Superseded, archived
Delegated to: Eduardo Valentin
Headers show

Commit Message

Javi Merino March 2, 2015, 5:17 p.m. UTC
From: Kapileshwar Singh <kapileshwar.singh@arm.com>

When cpufreq changes the policy cpu, we need to update our cached cpu
device accordingly.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
---
 drivers/thermal/cpu_cooling.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Viresh Kumar March 3, 2015, 4:03 a.m. UTC | #1
On Mon, Mar 2, 2015 at 10:47 PM, Javi Merino <javi.merino@arm.com> wrote:
> From: Kapileshwar Singh <kapileshwar.singh@arm.com>
>
> When cpufreq changes the policy cpu, we need to update our cached cpu
> device accordingly.
>
> Cc: Zhang Rui <rui.zhang@intel.com>
> Cc: Eduardo Valentin <edubezval@gmail.com>
> Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
> ---
>  drivers/thermal/cpu_cooling.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
> index c4974144c787..e306d6bc3cf1 100644
> --- a/drivers/thermal/cpu_cooling.c
> +++ b/drivers/thermal/cpu_cooling.c
> @@ -269,6 +269,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>                 mutex_unlock(&cooling_cpufreq_lock);
>                 break;
>
> +       case CPUFREQ_UPDATE_POLICY_CPU:
> +               update_cpu_device(policy->cpu);
> +               break;
>         case CPUFREQ_CREATE_POLICY:
>                 update_cpu_device(policy->cpu);
>                 break;

First of all, I wasn't able to find 3/5 on LKML and I looked at 3/7
from an earlier
version to look at how update_cpu_device() looks like.

What I couldn't understand is why do you need to update things if policy->cpu
is changing ?

I am expecting a detailed answer here according to your design, and we may
be able to work out without such updates. Lets see..
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kapileshwar Singh March 3, 2015, 10:59 a.m. UTC | #2
Hi Viresh!

On 03/03/15 04:03, Viresh Kumar wrote:
> On Mon, Mar 2, 2015 at 10:47 PM, Javi Merino <javi.merino@arm.com> wrote:
>> From: Kapileshwar Singh <kapileshwar.singh@arm.com>
>>
>> When cpufreq changes the policy cpu, we need to update our cached cpu
>> device accordingly.
>>
>> Cc: Zhang Rui <rui.zhang@intel.com>
>> Cc: Eduardo Valentin <edubezval@gmail.com>
>> Signed-off-by: Kapileshwar Singh <kapileshwar.singh@arm.com>
>> ---
>>  drivers/thermal/cpu_cooling.c | 3 +++
>>  1 file changed, 3 insertions(+)
>>
>> diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
>> index c4974144c787..e306d6bc3cf1 100644
>> --- a/drivers/thermal/cpu_cooling.c
>> +++ b/drivers/thermal/cpu_cooling.c
>> @@ -269,6 +269,9 @@ static int cpufreq_thermal_notifier(struct notifier_block *nb,
>>                 mutex_unlock(&cooling_cpufreq_lock);
>>                 break;
>>
>> +       case CPUFREQ_UPDATE_POLICY_CPU:
>> +               update_cpu_device(policy->cpu);
>> +               break;
>>         case CPUFREQ_CREATE_POLICY:
>>                 update_cpu_device(policy->cpu);
>>                 break;
> 
> First of all, I wasn't able to find 3/5 on LKML and I looked at 3/7
> from an earlier
> version to look at how update_cpu_device() looks like.
> 
> What I couldn't understand is why do you need to update things if policy->cpu
> is changing ?
> 
We store the device pointer of the lead CPU (policy CPU) in a cpufreq domain as a part of the 
cpufreq_cooling_device data structure. There is one cpufreq_cooling_device per 
cpufreq domain. 

We need the device to find out the current OPP for the cpufreq_cooling_device for our static power calculation.

        opp = opp_find_freq_exact(cpu_dev, freq_hz, true);
        voltage = dev_pm_opp_get_voltage(opp);


The problem we are trying to solve here is:

When this lead CPU gets hotplugged out, the device pointer becomes stale and the policy 
cpu for the cpufreq domain changes. We then store the new policy CPU's device pointer for the 
in cpufreq_cooling_device on the reception of a notification from cpufreq. Being open to your 
suggestions for any other possible ways to solve the problem..

Regards, 
KP


> I am expecting a detailed answer here according to your design, and we may
> be able to work out without such updates. Lets see..
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 3, 2015, 11:19 a.m. UTC | #3
On 3 March 2015 at 16:29, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
> We store the device pointer of the lead CPU (policy CPU) in a cpufreq domain as a part of the
> cpufreq_cooling_device data structure. There is one cpufreq_cooling_device per
> cpufreq domain.
>
> We need the device to find out the current OPP for the cpufreq_cooling_device for our static power calculation.
>
>         opp = opp_find_freq_exact(cpu_dev, freq_hz, true);
>         voltage = dev_pm_opp_get_voltage(opp);
>
>
> The problem we are trying to solve here is:
>
> When this lead CPU gets hotplugged out, the device pointer becomes stale and the policy
> cpu for the cpufreq domain changes. We then store the new policy CPU's device pointer for the
> in cpufreq_cooling_device on the reception of a notification from cpufreq. Being open to your
> suggestions for any other possible ways to solve the problem..

I would have loved that if life was that simple :)

So, the OPP library today isn't that perfect and so is this doing rounds [1].
The problem is the OPPs are initialized per device today and even if they
are shared by multiple CPUs, OPP library doesn't know about it.

So, if the policy->cpu goes away, OPP APIs on the new CPU will not work
as OPPs are only initialized for one CPU and not for others within the same
policy :)

The way cpufreq-dt is taking care of this is by saving cpu_dev of the first
CPU for which OPPs are initialized and always using that even if the CPU
goes away. And you need to do exactly that.

And please, do test such scenario before sending the patches again. As
it would have simply failed in this case, have you given it a try ..

Once my patchset [1] is applied, life would be very simple and we can
call OPP library for any CPU, but that is going to take some time.

--
viresh

[1] https://www.marc.info/?l=linaro-kernel&m=142364262800650&w=3
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kapileshwar Singh March 3, 2015, 11:41 a.m. UTC | #4
On 03/03/15 11:19, Viresh Kumar wrote:
> On 3 March 2015 at 16:29, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
>> We store the device pointer of the lead CPU (policy CPU) in a cpufreq domain as a part of the
>> cpufreq_cooling_device data structure. There is one cpufreq_cooling_device per
>> cpufreq domain.
>>
>> We need the device to find out the current OPP for the cpufreq_cooling_device for our static power calculation.
>>
>>         opp = opp_find_freq_exact(cpu_dev, freq_hz, true);
>>         voltage = dev_pm_opp_get_voltage(opp);
>>
>>
>> The problem we are trying to solve here is:
>>
>> When this lead CPU gets hotplugged out, the device pointer becomes stale and the policy
>> cpu for the cpufreq domain changes. We then store the new policy CPU's device pointer for the
>> in cpufreq_cooling_device on the reception of a notification from cpufreq. Being open to your
>> suggestions for any other possible ways to solve the problem..
> 
> I would have loved that if life was that simple :)
> 
> So, the OPP library today isn't that perfect and so is this doing rounds [1].
> The problem is the OPPs are initialized per device today and even if they
> are shared by multiple CPUs, OPP library doesn't know about it.
> 
> So, if the policy->cpu goes away, OPP APIs on the new CPU will not work
> as OPPs are only initialized for one CPU and not for others within the same
> policy :)
> 
> The way cpufreq-dt is taking care of this is by saving cpu_dev of the first
> CPU for which OPPs are initialized and always using that even if the CPU
> goes away. And you need to do exactly that.
> 
> And please, do test such scenario before sending the patches again. As
> it would have simply failed in this case, have you given it a try ..

Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:

..
<3>[67203.216774] opp_get_voltage: Invalid parameters
<3>[67203.326774] opp_get_voltage: Invalid parameters
<3>[67203.326774] opp_get_voltage: Invalid parameters
..

Which happens because:

unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
{
..
        tmp_opp = rcu_dereference(opp);
        if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
                pr_err("%s: Invalid parameters\n", __func__);
        else
..

Which happens when 

        opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
                                         true);

returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition) 

Regards, 
KP





> 
> Once my patchset [1] is applied, life would be very simple and we can
> call OPP library for any CPU, but that is going to take some time.
> 
> --
> viresh
> 
> [1] https://www.marc.info/?l=linaro-kernel&m=142364262800650&w=3
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 3, 2015, 1:07 p.m. UTC | #5
On 3 March 2015 at 17:11, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
> Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
> When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:

What do you mean by 'invalidates the device pointer' ? that cpu_dev is NULL ?

> <3>[67203.216774] opp_get_voltage: Invalid parameters
> <3>[67203.326774] opp_get_voltage: Invalid parameters
> <3>[67203.326774] opp_get_voltage: Invalid parameters

Have you handwritten them ? Why don't they precede with dev_pm_* ??

>
> Which happens because:
>
> unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
> {
> ..
>         tmp_opp = rcu_dereference(opp);
>         if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
>                 pr_err("%s: Invalid parameters\n", __func__);

This %s should print routine name ..

>         else
> ..
>
> Which happens when
>
>         opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
>                                          true);
>
> returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition)

Please goto the depth of this thing, as I don't think it should happen.

Over that I was asking you if you have tested the solution Javi gave,
because OPPs
wouldn't have been initialized for other CPUs once policy->cpu goes down.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kapileshwar Singh March 3, 2015, 3:09 p.m. UTC | #6
On 03/03/15 13:07, Viresh Kumar wrote:
> On 3 March 2015 at 17:11, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
>> Yes I indeed tested the case where we cache the device pointer of the CPU for which the OPP's are populated.
>> When this CPU is hotplugged out, it invalidates the device pointer itself. Here are the error we get in dmesg:
> 
> What do you mean by 'invalidates the device pointer' ? that cpu_dev is NULL ?

The cpu_dev is not NULL but we get an erroneous OPP back. We found the problem lies in the way we calculate the frequency for the cluster.

>> <3>[67203.216774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
>> <3>[67203.326774] opp_get_voltage: Invalid parameters
> 
> Have you handwritten them ? Why don't they precede with dev_pm_* ??

I have not handwritten them, It was from a Linaro 3.10 based kernel when I first noticed this issue but the same problem exists in mainline. 

Apologies for this I sent you an older trace which I had saved when I found the bug. Here is the trace I get from mainline

[ 5680.135339] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.245528] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.355432] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.465521] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.575599] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.685817] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.795556] dev_pm_opp_get_voltage: Invalid parameters
[ 5680.905598] dev_pm_opp_get_voltage: Invalid parameters

> 
>>
>> Which happens because:
>>
>> unsigned long dev_pm_opp_get_voltage(struct dev_pm_opp *opp)
>> {
>> ..
>>         tmp_opp = rcu_dereference(opp);
>>         if (unlikely(IS_ERR_OR_NULL(tmp_opp)) || !tmp_opp->available)
>>                 pr_err("%s: Invalid parameters\n", __func__);
> 
> This %s should print routine name ..
> 
>>         else
>> ..
>>
>> Which happens when
>>
>>         opp = dev_pm_opp_find_freq_exact(cpufreq_device->cpu_dev, freq_hz,
>>                                          true);
>>
>> returns a an erroneous or NULL OPP or the opp is unavailable (in the above condition)
> 

Update: This returns an erroneous  OPP

> Please goto the depth of this thing, as I don't think it should happen.
> 
> Over that I was asking you if you have tested the solution Javi gave,
> because OPPs
> wouldn't have been initialized for other CPUs once policy->cpu goes down.
I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case. 

We have looked at this more closely and found that problem lies in:

	freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));

which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.

Regards, 
KP

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla March 3, 2015, 3:26 p.m. UTC | #7
On Tue, Mar 3, 2015 at 3:09 PM, Kapileshwar Singh
<kapileshwar.singh@arm.com> wrote:
> On 03/03/15 13:07, Viresh Kumar wrote:

[...]

>> Please goto the depth of this thing, as I don't think it should happen.
>>
>> Over that I was asking you if you have tested the solution Javi gave,
>> because OPPs
>> wouldn't have been initialized for other CPUs once policy->cpu goes down.
> I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case.
>
> We have looked at this more closely and found that problem lies in:
>
>         freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
>
> which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.
>

You can use any_online_cpu(..) instead of cpumask_any IMO

Regards,
Sudeep
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 3, 2015, 3:29 p.m. UTC | #8
On 3 March 2015 at 20:39, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:

> I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case.

Then what did you test? My point here is, even with the latest patches
that you have
sent, you wouldn't be able to get the OPPs once policy->cpu goes down. So, how
did this worked for you ?

> We have looked at this more closely and found that problem lies in:
>
>         freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
>
> which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.

Right.
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Viresh Kumar March 3, 2015, 3:30 p.m. UTC | #9
On 3 March 2015 at 20:56, Sudeep Holla <sudeep.holla@arm.com> wrote:
> You can use any_online_cpu(..) instead of cpumask_any IMO

What about policy->cpu  ? :)
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sudeep Holla March 3, 2015, 3:33 p.m. UTC | #10
On 03/03/15 15:30, Viresh Kumar wrote:
> On 3 March 2015 at 20:56, Sudeep Holla <sudeep.holla@arm.com> wrote:
>> You can use any_online_cpu(..) instead of cpumask_any IMO
>
> What about policy->cpu  ? :)
>

We can, was not sure if they have access to policy in their thermal
driver. Just the first thought seeing one line of the code posted :)

Regards,
Sudeep

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kapileshwar Singh March 3, 2015, 3:34 p.m. UTC | #11
On 03/03/15 15:29, Viresh Kumar wrote:
> On 3 March 2015 at 20:39, Kapileshwar Singh <kapileshwar.singh@arm.com> wrote:
> 
>> I did test this but we were working with the assumption that OPPs should be populated for all the CPUs and also that OPPs are lost for a hotplugged CPU which I see is not the case.
> 
> Then what did you test? My point here is, even with the latest patches
> that you have
> sent, you wouldn't be able to get the OPPs once policy->cpu goes down. So, how
> did this worked for you ?

We were basing our fix on possibility of having OPPs for all the CPUs and we incorrectly attributed the erroneous OPP we got from dev_pm_opp_find_freq_exact to the missing OPPs in the other CPUs.

> 
>> We have looked at this more closely and found that problem lies in:
>>
>>         freq = cpufreq_quick_get(cpumask_any(&cpufreq_device->allowed_cpus));
>>
>> which returns a NULL frequency as we are not checking for online CPUs here. We shall come up with a fix for this. Many thanks for helping us with the investigation.
> 
> Right.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/thermal/cpu_cooling.c b/drivers/thermal/cpu_cooling.c
index c4974144c787..e306d6bc3cf1 100644
--- a/drivers/thermal/cpu_cooling.c
+++ b/drivers/thermal/cpu_cooling.c
@@ -269,6 +269,9 @@  static int cpufreq_thermal_notifier(struct notifier_block *nb,
 		mutex_unlock(&cooling_cpufreq_lock);
 		break;
 
+	case CPUFREQ_UPDATE_POLICY_CPU:
+		update_cpu_device(policy->cpu);
+		break;
 	case CPUFREQ_CREATE_POLICY:
 		update_cpu_device(policy->cpu);
 		break;