diff mbox

cpuidle: mvebu: Fix the CPU PM notifier usage

Message ID 1424971248-29076-1-git-send-email-gregory.clement@free-electrons.com (mailing list archive)
State Not Applicable, archived
Delegated to: Rafael Wysocki
Headers show

Commit Message

Gregory CLEMENT Feb. 26, 2015, 5:20 p.m. UTC
As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
that cpu_pm_enter is not called twice on the same CPU before
cpu_pm_exit is called.". In the current code in case of failure when
calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
called whereas cpu_pm_enter() was called just before.

This patch moves the cpu_pm_exit() in order to balance the
cpu_pm_enter() calls.

Reported-by: Fulvio Benini <fbf@libero.it>
Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
---
 drivers/cpuidle/cpuidle-mvebu-v7.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Rafael J. Wysocki Feb. 26, 2015, 9:55 p.m. UTC | #1
On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
> that cpu_pm_enter is not called twice on the same CPU before
> cpu_pm_exit is called.". In the current code in case of failure when
> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
> called whereas cpu_pm_enter() was called just before.
> 
> This patch moves the cpu_pm_exit() in order to balance the
> cpu_pm_enter() calls.
> 
> Reported-by: Fulvio Benini <fbf@libero.it>
> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>

Should that go to "stable" too?  Which "stable" series it should go to if so?

> ---
>  drivers/cpuidle/cpuidle-mvebu-v7.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
> index 38e68618513a..cefa07438ae1 100644
> --- a/drivers/cpuidle/cpuidle-mvebu-v7.c
> +++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
> @@ -37,11 +37,11 @@ static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
>  		deepidle = true;
>  
>  	ret = mvebu_v7_cpu_suspend(deepidle);
> +	cpu_pm_exit();
> +
>  	if (ret)
>  		return ret;
>  
> -	cpu_pm_exit();
> -
>  	return index;
>  }
>  
>
Gregory CLEMENT Feb. 27, 2015, 9:39 a.m. UTC | #2
Hi Rafael,

On 26/02/2015 22:55, Rafael J. Wysocki wrote:
> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>> that cpu_pm_enter is not called twice on the same CPU before
>> cpu_pm_exit is called.". In the current code in case of failure when
>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>> called whereas cpu_pm_enter() was called just before.
>>
>> This patch moves the cpu_pm_exit() in order to balance the
>> cpu_pm_enter() calls.
>>
>> Reported-by: Fulvio Benini <fbf@libero.it>
>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
> 
> Should that go to "stable" too?  Which "stable" series it should go to if so?

Yes as it fixes a potential issue, you're right it should go
to "stable". The bug was here since the introduction of the driver
in 3.16.

Thanks,

Gregory

> 
>> ---
>>  drivers/cpuidle/cpuidle-mvebu-v7.c | 4 ++--
>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
>> index 38e68618513a..cefa07438ae1 100644
>> --- a/drivers/cpuidle/cpuidle-mvebu-v7.c
>> +++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
>> @@ -37,11 +37,11 @@ static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
>>  		deepidle = true;
>>  
>>  	ret = mvebu_v7_cpu_suspend(deepidle);
>> +	cpu_pm_exit();
>> +
>>  	if (ret)
>>  		return ret;
>>  
>> -	cpu_pm_exit();
>> -
>>  	return index;
>>  }
>>  
>>
>
Daniel Lezcano Feb. 27, 2015, 4:50 p.m. UTC | #3
On 02/26/2015 10:55 PM, Rafael J. Wysocki wrote:
> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>> that cpu_pm_enter is not called twice on the same CPU before
>> cpu_pm_exit is called.". In the current code in case of failure when
>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>> called whereas cpu_pm_enter() was called just before.
>>
>> This patch moves the cpu_pm_exit() in order to balance the
>> cpu_pm_enter() calls.
>>
>> Reported-by: Fulvio Benini <fbf@libero.it>
>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>
> Should that go to "stable" too?  Which "stable" series it should go to if so?

Hi Rafael,

do you mind if I take this fix in my tree ? There is the same issue for 
cpuidle-arm64.c I fixed.
Rafael J. Wysocki Feb. 27, 2015, 10:18 p.m. UTC | #4
On Friday, February 27, 2015 05:50:23 PM Daniel Lezcano wrote:
> On 02/26/2015 10:55 PM, Rafael J. Wysocki wrote:
> > On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
> >> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
> >> that cpu_pm_enter is not called twice on the same CPU before
> >> cpu_pm_exit is called.". In the current code in case of failure when
> >> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
> >> called whereas cpu_pm_enter() was called just before.
> >>
> >> This patch moves the cpu_pm_exit() in order to balance the
> >> cpu_pm_enter() calls.
> >>
> >> Reported-by: Fulvio Benini <fbf@libero.it>
> >> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
> >
> > Should that go to "stable" too?  Which "stable" series it should go to if so?
> 
> Hi Rafael,
> 
> do you mind if I take this fix in my tree ?

Not at all, please go ahead.

> There is the same issue for cpuidle-arm64.c I fixed.

OK
Daniel Lezcano March 3, 2015, 10:30 a.m. UTC | #5
On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
> Hi Rafael,
>
> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>> that cpu_pm_enter is not called twice on the same CPU before
>>> cpu_pm_exit is called.". In the current code in case of failure when
>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>> called whereas cpu_pm_enter() was called just before.
>>>
>>> This patch moves the cpu_pm_exit() in order to balance the
>>> cpu_pm_enter() calls.
>>>
>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>
>> Should that go to "stable" too?  Which "stable" series it should go to if so?
>
> Yes as it fixes a potential issue, you're right it should go
> to "stable". The bug was here since the introduction of the driver
> in 3.16.

Hi Gregory,

actually the 'stable' rules state clearly:

"- It must fix a real bug that bothers people (not a, "This could be a 
problem..." type thing)."

You say "it fixes a potential issue", so no bug has been raise yet, right ?

>>> ---
>>>   drivers/cpuidle/cpuidle-mvebu-v7.c | 4 ++--
>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
>>> index 38e68618513a..cefa07438ae1 100644
>>> --- a/drivers/cpuidle/cpuidle-mvebu-v7.c
>>> +++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
>>> @@ -37,11 +37,11 @@ static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
>>>   		deepidle = true;
>>>
>>>   	ret = mvebu_v7_cpu_suspend(deepidle);
>>> +	cpu_pm_exit();
>>> +
>>>   	if (ret)
>>>   		return ret;
>>>
>>> -	cpu_pm_exit();
>>> -
>>>   	return index;
>>>   }
>>>
>>>
>>
>
>
Gregory CLEMENT March 3, 2015, 10:34 a.m. UTC | #6
Hi Rafael,



On 03/03/2015 11:30, Daniel Lezcano wrote:
> On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
>> Hi Rafael,
>>
>> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>>> that cpu_pm_enter is not called twice on the same CPU before
>>>> cpu_pm_exit is called.". In the current code in case of failure when
>>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>>> called whereas cpu_pm_enter() was called just before.
>>>>
>>>> This patch moves the cpu_pm_exit() in order to balance the
>>>> cpu_pm_enter() calls.
>>>>
>>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>>
>>> Should that go to "stable" too?  Which "stable" series it should go to if so?
>>
>> Yes as it fixes a potential issue, you're right it should go
>> to "stable". The bug was here since the introduction of the driver
>> in 3.16.
> 
> Hi Gregory,
> 
> actually the 'stable' rules state clearly:
> 
> "- It must fix a real bug that bothers people (not a, "This could be a 
> problem..." type thing)."
> 
> You say "it fixes a potential issue", so no bug has been raise yet, right ?

Indeed nobody claimed yet having a bug related to this issue.

Gregory


> 
>>>> ---
>>>>   drivers/cpuidle/cpuidle-mvebu-v7.c | 4 ++--
>>>>   1 file changed, 2 insertions(+), 2 deletions(-)
>>>>
>>>> diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
>>>> index 38e68618513a..cefa07438ae1 100644
>>>> --- a/drivers/cpuidle/cpuidle-mvebu-v7.c
>>>> +++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
>>>> @@ -37,11 +37,11 @@ static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
>>>>   		deepidle = true;
>>>>
>>>>   	ret = mvebu_v7_cpu_suspend(deepidle);
>>>> +	cpu_pm_exit();
>>>> +
>>>>   	if (ret)
>>>>   		return ret;
>>>>
>>>> -	cpu_pm_exit();
>>>> -
>>>>   	return index;
>>>>   }
>>>>
>>>>
>>>
>>
>>
> 
>
Fulvio March 3, 2015, 10:52 a.m. UTC | #7
Gregory CLEMENT wrote:
> Hi Rafael,
>
>
>
> On 03/03/2015 11:30, Daniel Lezcano wrote:
>   
>> On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
>>     
>>> Hi Rafael,
>>>
>>> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>>>       
>>>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>>>         
>>>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>>>> that cpu_pm_enter is not called twice on the same CPU before
>>>>> cpu_pm_exit is called.". In the current code in case of failure when
>>>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>>>> called whereas cpu_pm_enter() was called just before.
>>>>>
>>>>> This patch moves the cpu_pm_exit() in order to balance the
>>>>> cpu_pm_enter() calls.
>>>>>
>>>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>>>>           
>>>> Should that go to "stable" too?  Which "stable" series it should go to if so?
>>>>         
>>> Yes as it fixes a potential issue, you're right it should go
>>> to "stable". The bug was here since the introduction of the driver
>>> in 3.16.
>>>       
>> Hi Gregory,
>>
>> actually the 'stable' rules state clearly:
>>
>> "- It must fix a real bug that bothers people (not a, "This could be a 
>> problem..." type thing)."
>>
>> You say "it fixes a potential issue", so no bug has been raise yet, right ?
>>     
>
> Indeed nobody claimed yet having a bug related to this issue.
>
> Gregory
>
>   
I reported the issue, but i cannot say if it's a real bug.
I had random kernel panics with a Netgear ReadyNAS RN102 (armada 370 cpu):
http://www.readynas.com/forum/viewtopic.php?f=20&t=78697&sid=14747617286d55ac27296cdee7a3f420&start=210#p452214

All i can say is that the system use the "armadaxp_idle" driver and 
works fine when running "stress --cpu 8" in background.
I asked Netgear to provide a firmware without the idle driver to confirm 
if it's the cause of the problem, but they did not answered.
Bye,
Fulvio
--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano March 3, 2015, 11:12 a.m. UTC | #8
On 03/03/2015 11:52 AM, Fulvio wrote:
> Gregory CLEMENT wrote:
>> Hi Rafael,
>>
>>
>>
>> On 03/03/2015 11:30, Daniel Lezcano wrote:
>>> On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
>>>> Hi Rafael,
>>>>
>>>> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>>>>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>>>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>>>>> that cpu_pm_enter is not called twice on the same CPU before
>>>>>> cpu_pm_exit is called.". In the current code in case of failure when
>>>>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>>>>> called whereas cpu_pm_enter() was called just before.
>>>>>>
>>>>>> This patch moves the cpu_pm_exit() in order to balance the
>>>>>> cpu_pm_enter() calls.
>>>>>>
>>>>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>>>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>>>> Should that go to "stable" too?  Which "stable" series it should go
>>>>> to if so?
>>>> Yes as it fixes a potential issue, you're right it should go
>>>> to "stable". The bug was here since the introduction of the driver
>>>> in 3.16.
>>> Hi Gregory,
>>>
>>> actually the 'stable' rules state clearly:
>>>
>>> "- It must fix a real bug that bothers people (not a, "This could be
>>> a problem..." type thing)."
>>>
>>> You say "it fixes a potential issue", so no bug has been raise yet,
>>> right ?
>>
>> Indeed nobody claimed yet having a bug related to this issue.
>>
>> Gregory
>>
> I reported the issue, but i cannot say if it's a real bug.
> I had random kernel panics with a Netgear ReadyNAS RN102 (armada 370 cpu):
> http://www.readynas.com/forum/viewtopic.php?f=20&t=78697&sid=14747617286d55ac27296cdee7a3f420&start=210#p452214
>
>
> All i can say is that the system use the "armadaxp_idle" driver and
> works fine when running "stress --cpu 8" in background.

Hi Fulvio,

so IIUC, you suggest the stress test prevent the system to use cpuidle 
because of the busy cycles, right ?

Is it possible to have the kernel panic stack ?

Are you able to reproduce the kernel panic ? and test a new kernel ?

Thanks

   -- Daniel

> I asked Netgear to provide a firmware without the idle driver to confirm
> if it's the cause of the problem, but they did not answered.
> Bye,
> Fulvio
Gregory CLEMENT March 3, 2015, 12:51 p.m. UTC | #9
Hi Fulvio,

On 03/03/2015 11:52, Fulvio wrote:
> Gregory CLEMENT wrote:
>> Hi Rafael,
>>
>>
>>
>> On 03/03/2015 11:30, Daniel Lezcano wrote:
>>   
>>> On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
>>>     
>>>> Hi Rafael,
>>>>
>>>> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>>>>       
>>>>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>>>>         
>>>>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>>>>> that cpu_pm_enter is not called twice on the same CPU before
>>>>>> cpu_pm_exit is called.". In the current code in case of failure when
>>>>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>>>>> called whereas cpu_pm_enter() was called just before.
>>>>>>
>>>>>> This patch moves the cpu_pm_exit() in order to balance the
>>>>>> cpu_pm_enter() calls.
>>>>>>
>>>>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>>>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>>>>>           
>>>>> Should that go to "stable" too?  Which "stable" series it should go to if so?
>>>>>         
>>>> Yes as it fixes a potential issue, you're right it should go
>>>> to "stable". The bug was here since the introduction of the driver
>>>> in 3.16.
>>>>       
>>> Hi Gregory,
>>>
>>> actually the 'stable' rules state clearly:
>>>
>>> "- It must fix a real bug that bothers people (not a, "This could be a 
>>> problem..." type thing)."
>>>
>>> You say "it fixes a potential issue", so no bug has been raise yet, right ?
>>>     
>>
>> Indeed nobody claimed yet having a bug related to this issue.
>>
>> Gregory
>>
>>   
> I reported the issue, but i cannot say if it's a real bug.
> I had random kernel panics with a Netgear ReadyNAS RN102 (armada 370 cpu):
> http://www.readynas.com/forum/viewtopic.php?f=20&t=78697&sid=14747617286d55ac27296cdee7a3f420&start=210#p452214

I didn't know you experimented random kernel panics and that you thought
it was related to the CPU Idle driver.

> 
> All i can say is that the system use the "armadaxp_idle" driver and 
> works fine when running "stress --cpu 8" in background.
> I asked Netgear to provide a firmware without the idle driver to confirm 
> if it's the cause of the problem, but they did not answered.

I think that if you disable all the state using by doing an

echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/stateNUM/disable

where NUM is the nuymbert of the state. Then it should disable the
cpuidle on the fly.

Daniel, is it correct?

Or if you can modify your bootargs, then just add cpuidle.off=1 to the kernel
command line.

Thanks,

Gregory

> Bye,
> Fulvio
>
Daniel Lezcano March 3, 2015, 1 p.m. UTC | #10
On 03/03/2015 01:51 PM, Gregory CLEMENT wrote:
> Hi Fulvio,
>
> On 03/03/2015 11:52, Fulvio wrote:
>> Gregory CLEMENT wrote:
>>> Hi Rafael,
>>>
>>>
>>>
>>> On 03/03/2015 11:30, Daniel Lezcano wrote:
>>>
>>>> On 02/27/2015 10:39 AM, Gregory CLEMENT wrote:
>>>>
>>>>> Hi Rafael,
>>>>>
>>>>> On 26/02/2015 22:55, Rafael J. Wysocki wrote:
>>>>>
>>>>>> On Thursday, February 26, 2015 06:20:48 PM Gregory CLEMENT wrote:
>>>>>>
>>>>>>> As stated in kernel/cpu_pm.c, "Platform is responsible for ensuring
>>>>>>> that cpu_pm_enter is not called twice on the same CPU before
>>>>>>> cpu_pm_exit is called.". In the current code in case of failure when
>>>>>>> calling mvebu_v7_cpu_suspend, the function cpu_pm_exit() is never
>>>>>>> called whereas cpu_pm_enter() was called just before.
>>>>>>>
>>>>>>> This patch moves the cpu_pm_exit() in order to balance the
>>>>>>> cpu_pm_enter() calls.
>>>>>>>
>>>>>>> Reported-by: Fulvio Benini <fbf@libero.it>
>>>>>>> Signed-off-by: Gregory CLEMENT <gregory.clement@free-electrons.com>
>>>>>>>
>>>>>> Should that go to "stable" too?  Which "stable" series it should go to if so?
>>>>>>
>>>>> Yes as it fixes a potential issue, you're right it should go
>>>>> to "stable". The bug was here since the introduction of the driver
>>>>> in 3.16.
>>>>>
>>>> Hi Gregory,
>>>>
>>>> actually the 'stable' rules state clearly:
>>>>
>>>> "- It must fix a real bug that bothers people (not a, "This could be a
>>>> problem..." type thing)."
>>>>
>>>> You say "it fixes a potential issue", so no bug has been raise yet, right ?
>>>>
>>>
>>> Indeed nobody claimed yet having a bug related to this issue.
>>>
>>> Gregory
>>>
>>>
>> I reported the issue, but i cannot say if it's a real bug.
>> I had random kernel panics with a Netgear ReadyNAS RN102 (armada 370 cpu):
>> http://www.readynas.com/forum/viewtopic.php?f=20&t=78697&sid=14747617286d55ac27296cdee7a3f420&start=210#p452214
>
> I didn't know you experimented random kernel panics and that you thought
> it was related to the CPU Idle driver.
>
>>
>> All i can say is that the system use the "armadaxp_idle" driver and
>> works fine when running "stress --cpu 8" in background.
>> I asked Netgear to provide a firmware without the idle driver to confirm
>> if it's the cause of the problem, but they did not answered.
>
> I think that if you disable all the state using by doing an
>
> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/stateNUM/disable
>
> where NUM is the nuymbert of the state. Then it should disable the
> cpuidle on the fly.
>
> Daniel, is it correct?

Yes, it is correct.

Make sure it is done on all cpus.

> Or if you can modify your bootargs, then just add cpuidle.off=1 to the kernel
> command line.
>
> Thanks,
>
> Gregory
>
>> Bye,
>> Fulvio
>>
>
>
Fulvio March 3, 2015, 2:58 p.m. UTC | #11
> I didn't know you experimented random kernel panics and that you thought
> it was related to the CPU Idle driver.
>
>   
>> All i can say is that the system use the "armadaxp_idle" driver and 
>> works fine when running "stress --cpu 8" in background.
>> I asked Netgear to provide a firmware without the idle driver to confirm 
>> if it's the cause of the problem, but they did not answered.
>>     
>
> I think that if you disable all the state using by doing an
>
> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/stateNUM/disable
>
> where NUM is the nuymbert of the state. Then it should disable the
> cpuidle on the fly.
>   
Thanks, i'll try that as soon as possible (i gave back the unit to my 
client) and report back.

However, the description of the cpu_pm_enter function state:
"Must be called on the affected CPU with interrupts disabled.  Platform 
is responsible for ensuring that cpu_pm_enter is not called twice on the 
same CPU before cpu_pm_exit is called. Notified drivers can include VFP  
co-processor, interrupt controller and its PM extensions, local CPU  
timers context save/restore which shouldn't be interrupted. Hence it  
must be called with interrupts disabled."

and the point is: it that an invariant? Do current code and future code 
safely assume that cpu_pm_enter is not called twice?
For example if cpu_pm_enter do "context save" and cpu_pm_exit do 
"context restore", calling twice cpu_pm_enter will overwrite the 
previous saved context: is that safe in all circumstances?

I assume the rule " It must fix a real bug that bothers people (not a, 
"This could be a problem..." type thing)." is to avoid committing 
useless changes that may introduce new bugs, but i do not think that 
apply to this case: a bug report from an unknown user (me) should change 
nothing.

Bye,
Fulvio


--
To unsubscribe from this list: send the line "unsubscribe linux-pm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Daniel Lezcano March 3, 2015, 3:20 p.m. UTC | #12
On 03/03/2015 03:58 PM, Fulvio wrote:
>
>> I didn't know you experimented random kernel panics and that you thought
>> it was related to the CPU Idle driver.
>>
>>> All i can say is that the system use the "armadaxp_idle" driver and
>>> works fine when running "stress --cpu 8" in background.
>>> I asked Netgear to provide a firmware without the idle driver to
>>> confirm if it's the cause of the problem, but they did not answered.
>>
>> I think that if you disable all the state using by doing an
>>
>> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/stateNUM/disable
>>
>> where NUM is the nuymbert of the state. Then it should disable the
>> cpuidle on the fly.
> Thanks, i'll try that as soon as possible (i gave back the unit to my
> client) and report back.
>
> However, the description of the cpu_pm_enter function state:
> "Must be called on the affected CPU with interrupts disabled.  Platform
> is responsible for ensuring that cpu_pm_enter is not called twice on the
> same CPU before cpu_pm_exit is called. Notified drivers can include VFP
> co-processor, interrupt controller and its PM extensions, local CPU
> timers context save/restore which shouldn't be interrupted. Hence it
> must be called with interrupts disabled."
>
> and the point is: it that an invariant? Do current code and future code
> safely assume that cpu_pm_enter is not called twice?

The fix is correct. The cpu_pm_enter/exit symmetry must be kept because 
we don't know what the notifier clients are doing.

The point is : can we send it to stable@ as a bug fix or not ?

> For example if cpu_pm_enter do "context save" and cpu_pm_exit do
> "context restore", calling twice cpu_pm_enter will overwrite the
> previous saved context: is that safe in all circumstances?

That is the drawback of the notifiers: the kernel provides a service and 
everyone plug something on it. The cpu_pm notifier are very low level 
functions, so the answer of your question is not obvious. I already 
checked all the cpuidle drivers if the potential bug you reported is 
there or not but apparently everything else is fine, cpu_pm_exit is 
always called after cpu_pm_enter.

As you stated, the API description implies cpu_pm_exit must be called 
after cpu_pm_enter. So the fix is right.

> I assume the rule " It must fix a real bug that bothers people (not a,
> "This could be a problem..." type thing)." is to avoid committing
> useless changes that may introduce new bugs, but i do not think that
> apply to this case: a bug report from an unknown user (me) should change
> nothing.

It would be perfect if we can succeed to reproduce the bug you are 
facing and check the patch fixes it. In this case, it goes to stable@ 
automatically.
Daniel Lezcano March 4, 2015, 2:34 p.m. UTC | #13
On 03/04/2015 03:53 PM, Rafael J. Wysocki wrote:
> On Tuesday, March 03, 2015 04:20:26 PM Daniel Lezcano wrote:
>> On 03/03/2015 03:58 PM, Fulvio wrote:

[ ... ]

>> The fix is correct. The cpu_pm_enter/exit symmetry must be kept because
>> we don't know what the notifier clients are doing.
>>
>> The point is : can we send it to stable@ as a bug fix or not ?
>
> Yes, we can, unless there's a risk of breaking things that cannot be
> ignored.

Ok, I added the stable@ tag.

Thanks
   -- Daniel
Rafael J. Wysocki March 4, 2015, 2:53 p.m. UTC | #14
On Tuesday, March 03, 2015 04:20:26 PM Daniel Lezcano wrote:
> On 03/03/2015 03:58 PM, Fulvio wrote:
> >
> >> I didn't know you experimented random kernel panics and that you thought
> >> it was related to the CPU Idle driver.
> >>
> >>> All i can say is that the system use the "armadaxp_idle" driver and
> >>> works fine when running "stress --cpu 8" in background.
> >>> I asked Netgear to provide a firmware without the idle driver to
> >>> confirm if it's the cause of the problem, but they did not answered.
> >>
> >> I think that if you disable all the state using by doing an
> >>
> >> echo 1 > /sys/devices/system/cpu/cpu0/cpuidle/stateNUM/disable
> >>
> >> where NUM is the nuymbert of the state. Then it should disable the
> >> cpuidle on the fly.
> > Thanks, i'll try that as soon as possible (i gave back the unit to my
> > client) and report back.
> >
> > However, the description of the cpu_pm_enter function state:
> > "Must be called on the affected CPU with interrupts disabled.  Platform
> > is responsible for ensuring that cpu_pm_enter is not called twice on the
> > same CPU before cpu_pm_exit is called. Notified drivers can include VFP
> > co-processor, interrupt controller and its PM extensions, local CPU
> > timers context save/restore which shouldn't be interrupted. Hence it
> > must be called with interrupts disabled."
> >
> > and the point is: it that an invariant? Do current code and future code
> > safely assume that cpu_pm_enter is not called twice?
> 
> The fix is correct. The cpu_pm_enter/exit symmetry must be kept because 
> we don't know what the notifier clients are doing.
> 
> The point is : can we send it to stable@ as a bug fix or not ?

Yes, we can, unless there's a risk of breaking things that cannot be
ignored.
diff mbox

Patch

diff --git a/drivers/cpuidle/cpuidle-mvebu-v7.c b/drivers/cpuidle/cpuidle-mvebu-v7.c
index 38e68618513a..cefa07438ae1 100644
--- a/drivers/cpuidle/cpuidle-mvebu-v7.c
+++ b/drivers/cpuidle/cpuidle-mvebu-v7.c
@@ -37,11 +37,11 @@  static int mvebu_v7_enter_idle(struct cpuidle_device *dev,
 		deepidle = true;
 
 	ret = mvebu_v7_cpu_suspend(deepidle);
+	cpu_pm_exit();
+
 	if (ret)
 		return ret;
 
-	cpu_pm_exit();
-
 	return index;
 }