diff mbox

Crash after 'reboot' due to 9be4fd2c7723a

Message ID 4080948.V5DMHR9bsk@vostro.rjw.lan (mailing list archive)
State New, archived
Headers show

Commit Message

Rafael J. Wysocki May 20, 2016, 9:20 p.m. UTC
On Friday, May 20, 2016 03:32:43 PM Fabio Estevam wrote:
> On Fri, May 20, 2016 at 3:05 PM, Fabio Estevam <festevam@gmail.com> wrote:
> > Rafael,
> >
> > Running the 'reboot' command works fine on a 4.5 kernel running on a
> > mx6ul platform (ARM single core SoC), but it crashes on 4.6.
> >
> > Running bisect I got 9be4fd2c7723a3057b0b39676 ("cpufreq: governor:
> > Replace timers with utilization update callbacks") as the first bad
> > commit.
> >
> > Below is the output crash log.
> >
> > Not sure if this issue is related to the problem reported by Guenter here:
> > http://lkml.iu.edu/hypermail/linux/kernel/1602.1/06075.html
> >
> > Any ideas?
> 
> If I rebuilt a kernel with SMP=n then the reboot command works as expected.

Does it work if you boot with maxcpus=1 or nosmp in the kernel command line?

Also, please try if the appended patch makes any difference.

---
 drivers/base/core.c       |    3 +++
 drivers/cpufreq/cpufreq.c |   11 -----------
 2 files changed, 3 insertions(+), 11 deletions(-)

Comments

Fabio Estevam May 20, 2016, 9:37 p.m. UTC | #1
Hi Rafael,

On Fri, May 20, 2016 at 6:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:

>> If I rebuilt a kernel with SMP=n then the reboot command works as expected.
>
> Does it work if you boot with maxcpus=1 or nosmp in the kernel command line?

No, it does not help.

> Also, please try if the appended patch makes any difference.

I tried it, and the console shows "Requesting system reboot" and stays
there forever. No more messages after that and the system does not
reboot.
Rafael J. Wysocki May 20, 2016, 9:48 p.m. UTC | #2
On Fri, May 20, 2016 at 11:37 PM, Fabio Estevam <festevam@gmail.com> wrote:
> Hi Rafael,
>
> On Fri, May 20, 2016 at 6:20 PM, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>
>>> If I rebuilt a kernel with SMP=n then the reboot command works as expected.
>>
>> Does it work if you boot with maxcpus=1 or nosmp in the kernel command line?
>
> No, it does not help.
>
>> Also, please try if the appended patch makes any difference.
>
> I tried it, and the console shows "Requesting system reboot" and stays
> there forever. No more messages after that and the system does not
> reboot.

So what about if you comment out the cpufreq_suspend() line in
device_shutdown() (after applying the patch)?
Fabio Estevam May 20, 2016, 9:55 p.m. UTC | #3
On Fri, May 20, 2016 at 6:48 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:

> So what about if you comment out the cpufreq_suspend() line in
> device_shutdown() (after applying the patch)?

Then it reboots fine :-) Thanks
Rafael J. Wysocki May 20, 2016, 9:59 p.m. UTC | #4
On Fri, May 20, 2016 at 11:55 PM, Fabio Estevam <festevam@gmail.com> wrote:
> On Fri, May 20, 2016 at 6:48 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
>> So what about if you comment out the cpufreq_suspend() line in
>> device_shutdown() (after applying the patch)?
>
> Then it reboots fine :-) Thanks

But that probably means that irq_work_sync() doesn't work on your system.

Can you please try to boot and change the cpufreq governor to
performance and back and see if that works?

Is this an SMP board, actually?
Rafael J. Wysocki May 20, 2016, 10:27 p.m. UTC | #5
On Friday, May 20, 2016 11:59:06 PM Rafael J. Wysocki wrote:
> On Fri, May 20, 2016 at 11:55 PM, Fabio Estevam <festevam@gmail.com> wrote:
> > On Fri, May 20, 2016 at 6:48 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> >> So what about if you comment out the cpufreq_suspend() line in
> >> device_shutdown() (after applying the patch)?
> >
> > Then it reboots fine :-) Thanks
> 
> But that probably means that irq_work_sync() doesn't work on your system.
> 
> Can you please try to boot and change the cpufreq governor to
> performance and back and see if that works?
> 
> Is this an SMP board, actually?

No, it isn't (checked the report again).

Silly question maybe: Is CONFIG_IRQ_WORK set in your .config?
Rafael J. Wysocki May 20, 2016, 10:31 p.m. UTC | #6
On Saturday, May 21, 2016 12:27:54 AM Rafael J. Wysocki wrote:
> On Friday, May 20, 2016 11:59:06 PM Rafael J. Wysocki wrote:
> > On Fri, May 20, 2016 at 11:55 PM, Fabio Estevam <festevam@gmail.com> wrote:
> > > On Fri, May 20, 2016 at 6:48 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> > >
> > >> So what about if you comment out the cpufreq_suspend() line in
> > >> device_shutdown() (after applying the patch)?
> > >
> > > Then it reboots fine :-) Thanks
> > 
> > But that probably means that irq_work_sync() doesn't work on your system.
> > 
> > Can you please try to boot and change the cpufreq governor to
> > performance and back and see if that works?
> > 
> > Is this an SMP board, actually?
> 
> No, it isn't (checked the report again).
> 
> Silly question maybe: Is CONFIG_IRQ_WORK set in your .config?

Well, the kernel wouldn't have built for you without it, so it must be set.
Fabio Estevam May 21, 2016, 12:12 a.m. UTC | #7
On Fri, May 20, 2016 at 6:59 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:

> Can you please try to boot and change the cpufreq governor to
> performance and back and see if that works?

I cannot change to governor with 9be4fd2c7723a305.

After running 'cat performance >
/sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
the system hangs silently.

Reverting 9be4fd2c7723a305 allows me to change to performace.

> Is this an SMP board, actually?

Yes, this is a single core CortexA7.
Rafael J. Wysocki May 21, 2016, 12:30 a.m. UTC | #8
On Sat, May 21, 2016 at 2:12 AM, Fabio Estevam <festevam@gmail.com> wrote:
> On Fri, May 20, 2016 at 6:59 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>
>> Can you please try to boot and change the cpufreq governor to
>> performance and back and see if that works?
>
> I cannot change to governor with 9be4fd2c7723a305.
>
> After running 'cat performance >
> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
> the system hangs silently.

So this means that irq_work doesn't really work on your system at all.
It looks like queuing up an irq_work sort of works (because the
IRQ_WORK_BUSY), but then the queue is never processed.

> Reverting 9be4fd2c7723a305 allows me to change to performace.

What exactly do you mean by reverting here?  It surely cannot be
reverted by itself and there are many things that depend on it.

>> Is this an SMP board, actually?
>
> Yes, this is a single core CortexA7.

Hmm.  Single core means non-SMP rather?
Rafael J. Wysocki May 21, 2016, 12:31 a.m. UTC | #9
On Sat, May 21, 2016 at 2:30 AM, Rafael J. Wysocki <rafael@kernel.org> wrote:
> On Sat, May 21, 2016 at 2:12 AM, Fabio Estevam <festevam@gmail.com> wrote:
>> On Fri, May 20, 2016 at 6:59 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:
>>
>>> Can you please try to boot and change the cpufreq governor to
>>> performance and back and see if that works?
>>
>> I cannot change to governor with 9be4fd2c7723a305.
>>
>> After running 'cat performance >
>> /sys/devices/system/cpu/cpu0/cpufreq/scaling_governor'
>> the system hangs silently.
>
> So this means that irq_work doesn't really work on your system at all.
> It looks like queuing up an irq_work sort of works (because the
> IRQ_WORK_BUSY), but then the queue is never processed.

I wanted to say "because the IRQ_WORK_BUSY flag is set", sorry.
Fabio Estevam May 21, 2016, 12:36 a.m. UTC | #10
On Fri, May 20, 2016 at 9:30 PM, Rafael J. Wysocki <rafael@kernel.org> wrote:

> What exactly do you mean by reverting here?  It surely cannot be
> reverted by itself and there are many things that depend on it.

Yes, it cannot be reverted cleanly on 4.6.

If I do 'git checkout 9be4fd2c7723a30' and then 'git revert
9be4fd2c7723a30' then the reboot command works.

>>> Is this an SMP board, actually?
>>
>> Yes, this is a single core CortexA7.
>
> Hmm.  Single core means non-SMP rather?

CONFIG_SMP can be set or not in the kernel config. With CONFIG_SMP=n
the issue does not happen.
diff mbox

Patch

Index: linux-pm/drivers/cpufreq/cpufreq.c
===================================================================
--- linux-pm.orig/drivers/cpufreq/cpufreq.c
+++ linux-pm/drivers/cpufreq/cpufreq.c
@@ -27,7 +27,6 @@ 
 #include <linux/mutex.h>
 #include <linux/slab.h>
 #include <linux/suspend.h>
-#include <linux/syscore_ops.h>
 #include <linux/tick.h>
 #include <trace/events/power.h>
 
@@ -2556,14 +2555,6 @@  int cpufreq_unregister_driver(struct cpu
 }
 EXPORT_SYMBOL_GPL(cpufreq_unregister_driver);
 
-/*
- * Stop cpufreq at shutdown to make sure it isn't holding any locks
- * or mutexes when secondary CPUs are halted.
- */
-static struct syscore_ops cpufreq_syscore_ops = {
-	.shutdown = cpufreq_suspend,
-};
-
 struct kobject *cpufreq_global_kobject;
 EXPORT_SYMBOL(cpufreq_global_kobject);
 
@@ -2575,8 +2566,6 @@  static int __init cpufreq_core_init(void
 	cpufreq_global_kobject = kobject_create_and_add("cpufreq", &cpu_subsys.dev_root->kobj);
 	BUG_ON(!cpufreq_global_kobject);
 
-	register_syscore_ops(&cpufreq_syscore_ops);
-
 	return 0;
 }
 core_initcall(cpufreq_core_init);
Index: linux-pm/drivers/base/core.c
===================================================================
--- linux-pm.orig/drivers/base/core.c
+++ linux-pm/drivers/base/core.c
@@ -10,6 +10,7 @@ 
  *
  */
 
+#include <linux/cpufreq.h>
 #include <linux/device.h>
 #include <linux/err.h>
 #include <linux/fwnode.h>
@@ -2042,6 +2043,8 @@  void device_shutdown(void)
 {
 	struct device *dev, *parent;
 
+	cpufreq_suspend();
+
 	spin_lock(&devices_kset->list_lock);
 	/*
 	 * Walk the devices list backward, shutting down each in turn.