Message ID | 1452688405-15087-1-git-send-email-j-keerthy@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
* Keerthy <j-keerthy@ti.com> wrote: > orderly_poweroff is triggered when a graceful shutdown > of system is desired. This may be used in many critical states of the > kernel such as when subsystems detects conditions such as critical > temperature conditions. However, in certain conditions in system > boot up sequences like those in the middle of driver probes being > initiated, userspace will be unable to power off the system in a clean > manner and leaves the system in a critical state. In cases like these, > the /sbin/poweroff will return success (having forked off to attempt > powering off the system. However, the system overall will fail to > completely poweroff (since other modules will be probed) and the system > is still functional with no userspace (since that would have shut itself > off). > > However, there is no clean way of detecting such failure of userspace > powering off the system. In such scenarios, it is necessary for a backup > workqueue to be able to force a shutdown of the system when orderly > shutdown is not successful after a configurable time period. > > Signed-off-by: Keerthy <j-keerthy@ti.com> > Suggested-by: Eduardo Valentin <edubezval@gmail.com> > Reported-by: Nishanth Menon <nm@ti.com> > --- > Links to previous discussion can be found here: > > http://www.spinics.net/lists/linux-omap/msg124925.html > > Boot tested on DRA7. > > changes in v2: > > * Changed #ifdef to #if CONFIG_SHUTDOWN_BACKUP_DELAY_MS > > arch/Kconfig | 7 +++++++ > kernel/reboot.c | 23 ++++++++++++++++++----- > 2 files changed, 25 insertions(+), 5 deletions(-) > > Index: linux/arch/Kconfig > =================================================================== > --- linux.orig/arch/Kconfig 2016-01-11 15:26:07.732173131 +0530 > +++ linux/arch/Kconfig 2016-01-11 15:26:07.728173205 +0530 > @@ -37,6 +37,18 @@ > def_bool y > depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64 > > +config SHUTDOWN_BACKUP_DELAY_MS > + int "Backup shutdown delay in milli-seconds" > + default 0 > + help > + The number of milliseconds to delay before backup workqueue > + executes attempting to poweroff the system after the > + orderly_poweroff function has failed to complete. > + > + If set to 0, the backup workqueue is not active. The value > + should be conservatively configured based on userspace latencies > + expected for a given system. I don't really understand this. In what circumstances can a reboot fail? I think that is what should be fixed: a reboot should never fail, instead of introducing some sort of fragile timeout based method. Thanks, Ingo
Hi Ingo, On Thursday 14 January 2016 02:35 PM, Ingo Molnar wrote: > > * Keerthy <j-keerthy@ti.com> wrote: > >> orderly_poweroff is triggered when a graceful shutdown >> of system is desired. This may be used in many critical states of the >> kernel such as when subsystems detects conditions such as critical >> temperature conditions. However, in certain conditions in system >> boot up sequences like those in the middle of driver probes being >> initiated, userspace will be unable to power off the system in a clean >> manner and leaves the system in a critical state. In cases like these, >> the /sbin/poweroff will return success (having forked off to attempt >> powering off the system. However, the system overall will fail to >> completely poweroff (since other modules will be probed) and the system >> is still functional with no userspace (since that would have shut itself >> off). >> >> However, there is no clean way of detecting such failure of userspace >> powering off the system. In such scenarios, it is necessary for a backup >> workqueue to be able to force a shutdown of the system when orderly >> shutdown is not successful after a configurable time period. >> >> Signed-off-by: Keerthy <j-keerthy@ti.com> >> Suggested-by: Eduardo Valentin <edubezval@gmail.com> >> Reported-by: Nishanth Menon <nm@ti.com> >> --- >> Links to previous discussion can be found here: >> >> http://www.spinics.net/lists/linux-omap/msg124925.html >> >> Boot tested on DRA7. >> >> changes in v2: >> >> * Changed #ifdef to #if CONFIG_SHUTDOWN_BACKUP_DELAY_MS >> >> arch/Kconfig | 7 +++++++ >> kernel/reboot.c | 23 ++++++++++++++++++----- >> 2 files changed, 25 insertions(+), 5 deletions(-) >> >> Index: linux/arch/Kconfig >> =================================================================== >> --- linux.orig/arch/Kconfig 2016-01-11 15:26:07.732173131 +0530 >> +++ linux/arch/Kconfig 2016-01-11 15:26:07.728173205 +0530 >> @@ -37,6 +37,18 @@ >> def_bool y >> depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64 >> >> +config SHUTDOWN_BACKUP_DELAY_MS >> + int "Backup shutdown delay in milli-seconds" >> + default 0 >> + help >> + The number of milliseconds to delay before backup workqueue >> + executes attempting to poweroff the system after the >> + orderly_poweroff function has failed to complete. >> + >> + If set to 0, the backup workqueue is not active. The value >> + should be conservatively configured based on userspace latencies >> + expected for a given system. > > I don't really understand this. In what circumstances can a reboot fail? > > I think that is what should be fixed: a reboot should never fail, instead of > introducing some sort of fragile timeout based method. Here is the complete description of the scenario which was reported by Nishanth who encountered the issue. The link has bootlogs and description of the exact case which led to this patch. http://www.spinics.net/lists/linux-omap/msg124923.html Regards, Keerthy > > Thanks, > > Ingo >
* Keerthy <a0393675@ti.com> wrote: > Hi Ingo, > > On Thursday 14 January 2016 02:35 PM, Ingo Molnar wrote: > > > >* Keerthy <j-keerthy@ti.com> wrote: > > > >>orderly_poweroff is triggered when a graceful shutdown > >>of system is desired. This may be used in many critical states of the > >>kernel such as when subsystems detects conditions such as critical > >>temperature conditions. However, in certain conditions in system > >>boot up sequences like those in the middle of driver probes being > >>initiated, userspace will be unable to power off the system in a clean > >>manner and leaves the system in a critical state. In cases like these, > >>the /sbin/poweroff will return success (having forked off to attempt > >>powering off the system. However, the system overall will fail to > >>completely poweroff (since other modules will be probed) and the system > >>is still functional with no userspace (since that would have shut itself > >>off). > >> > >>However, there is no clean way of detecting such failure of userspace > >>powering off the system. In such scenarios, it is necessary for a backup > >>workqueue to be able to force a shutdown of the system when orderly > >>shutdown is not successful after a configurable time period. > >> > >>Signed-off-by: Keerthy <j-keerthy@ti.com> > >>Suggested-by: Eduardo Valentin <edubezval@gmail.com> > >>Reported-by: Nishanth Menon <nm@ti.com> > >>--- > >>Links to previous discussion can be found here: > >> > >>http://www.spinics.net/lists/linux-omap/msg124925.html > >> > >>Boot tested on DRA7. > >> > >>changes in v2: > >> > >> * Changed #ifdef to #if CONFIG_SHUTDOWN_BACKUP_DELAY_MS > >> > >> arch/Kconfig | 7 +++++++ > >> kernel/reboot.c | 23 ++++++++++++++++++----- > >> 2 files changed, 25 insertions(+), 5 deletions(-) > >> > >>Index: linux/arch/Kconfig > >>=================================================================== > >>--- linux.orig/arch/Kconfig 2016-01-11 15:26:07.732173131 +0530 > >>+++ linux/arch/Kconfig 2016-01-11 15:26:07.728173205 +0530 > >>@@ -37,6 +37,18 @@ > >> def_bool y > >> depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64 > >> > >>+config SHUTDOWN_BACKUP_DELAY_MS > >>+ int "Backup shutdown delay in milli-seconds" > >>+ default 0 > >>+ help > >>+ The number of milliseconds to delay before backup workqueue > >>+ executes attempting to poweroff the system after the > >>+ orderly_poweroff function has failed to complete. > >>+ > >>+ If set to 0, the backup workqueue is not active. The value > >>+ should be conservatively configured based on userspace latencies > >>+ expected for a given system. > > > >I don't really understand this. In what circumstances can a reboot fail? > > > >I think that is what should be fixed: a reboot should never fail, instead of > >introducing some sort of fragile timeout based method. > > Here is the complete description of the scenario which was reported by Nishanth > who encountered the issue. The link has bootlogs and description of the exact > case which led to this patch. > > http://www.spinics.net/lists/linux-omap/msg124923.html it's a reply in the middle of a discussion ... What I managed to decode is that this: static int __orderly_poweroff(bool force) { int ret; ret = run_cmd(poweroff_cmd); if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? */ emergency_sync(); kernel_power_off(); } return ret; } could fail to actually power the system off, if the run_cmd(poweroff_cmd) 'succeeds', but due to a user-space bug it does not actually call the real poweroff system call? Thanks, Ingo
Hi Ingo, On Thursday 14 January 2016 03:39 PM, Ingo Molnar wrote: > > * Keerthy <a0393675@ti.com> wrote: > >> Hi Ingo, >> >> On Thursday 14 January 2016 02:35 PM, Ingo Molnar wrote: >>> >>> * Keerthy <j-keerthy@ti.com> wrote: >>> >>>> orderly_poweroff is triggered when a graceful shutdown >>>> of system is desired. This may be used in many critical states of the >>>> kernel such as when subsystems detects conditions such as critical >>>> temperature conditions. However, in certain conditions in system >>>> boot up sequences like those in the middle of driver probes being >>>> initiated, userspace will be unable to power off the system in a clean >>>> manner and leaves the system in a critical state. In cases like these, >>>> the /sbin/poweroff will return success (having forked off to attempt >>>> powering off the system. However, the system overall will fail to >>>> completely poweroff (since other modules will be probed) and the system >>>> is still functional with no userspace (since that would have shut itself >>>> off). >>>> >>>> However, there is no clean way of detecting such failure of userspace >>>> powering off the system. In such scenarios, it is necessary for a backup >>>> workqueue to be able to force a shutdown of the system when orderly >>>> shutdown is not successful after a configurable time period. >>>> >>>> Signed-off-by: Keerthy <j-keerthy@ti.com> >>>> Suggested-by: Eduardo Valentin <edubezval@gmail.com> >>>> Reported-by: Nishanth Menon <nm@ti.com> >>>> --- >>>> Links to previous discussion can be found here: >>>> >>>> http://www.spinics.net/lists/linux-omap/msg124925.html >>>> >>>> Boot tested on DRA7. >>>> >>>> changes in v2: >>>> >>>> * Changed #ifdef to #if CONFIG_SHUTDOWN_BACKUP_DELAY_MS >>>> >>>> arch/Kconfig | 7 +++++++ >>>> kernel/reboot.c | 23 ++++++++++++++++++----- >>>> 2 files changed, 25 insertions(+), 5 deletions(-) >>>> >>>> Index: linux/arch/Kconfig >>>> =================================================================== >>>> --- linux.orig/arch/Kconfig 2016-01-11 15:26:07.732173131 +0530 >>>> +++ linux/arch/Kconfig 2016-01-11 15:26:07.728173205 +0530 >>>> @@ -37,6 +37,18 @@ >>>> def_bool y >>>> depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64 >>>> >>>> +config SHUTDOWN_BACKUP_DELAY_MS >>>> + int "Backup shutdown delay in milli-seconds" >>>> + default 0 >>>> + help >>>> + The number of milliseconds to delay before backup workqueue >>>> + executes attempting to poweroff the system after the >>>> + orderly_poweroff function has failed to complete. >>>> + >>>> + If set to 0, the backup workqueue is not active. The value >>>> + should be conservatively configured based on userspace latencies >>>> + expected for a given system. >>> >>> I don't really understand this. In what circumstances can a reboot fail? >>> >>> I think that is what should be fixed: a reboot should never fail, instead of >>> introducing some sort of fragile timeout based method. >> >> Here is the complete description of the scenario which was reported by Nishanth >> who encountered the issue. The link has bootlogs and description of the exact >> case which led to this patch. >> >> http://www.spinics.net/lists/linux-omap/msg124923.html > > it's a reply in the middle of a discussion ... > > What I managed to decode is that this: > > static int __orderly_poweroff(bool force) > { > int ret; > > ret = run_cmd(poweroff_cmd); > > if (ret && force) { > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > /* > * I guess this should try to kick off some daemon to sync and > * poweroff asap. Or not even bother syncing if we're doing an > * emergency shutdown? > */ > emergency_sync(); > kernel_power_off(); > } > > return ret; > } > > could fail to actually power the system off, if the run_cmd(poweroff_cmd) > 'succeeds', but due to a user-space bug it does not actually call the real > poweroff system call? > I tried to simulate the issue. In the probe function of drivers/thermal/ti-soc-thermal/ti-bandgap.c ti_bandgap_probe i call orderly_poweroff(true); This is while driver probes are still on going. I observe that ret = run_cmd(poweroff_cmd); ret is a non-zero value and we enter the if condition: Even after the emergency_sync(); kernel_power_off(); calls the console remained active in weird state. > Thanks, > > Ingo >
* Keerthy <a0393675@ti.com> wrote: > I tried to simulate the issue. > > In the probe function of drivers/thermal/ti-soc-thermal/ti-bandgap.c > ti_bandgap_probe i call > > orderly_poweroff(true); > > This is while driver probes are still on going. I observe that > ret = run_cmd(poweroff_cmd); > > ret is a non-zero value and we enter the if condition: > > Even after the > > emergency_sync(); > kernel_power_off(); > > calls > > the console remained active in weird state. Now _that_ is clearly an architecture bug that should not be papered over ... If kernel_power_off() is called then the system should power off. No ifs and whens. Thanks, Ingo
> > If kernel_power_off() is called then the system should power off. No ifs and > whens. Even if it doesn't the watchdog should kill it. That is broken on some platforms on the watchdog side as the watchdog shuts down during our power off callbacks - because the system firmware is too stupid to reset the watchdog as it powers back up (so keeps rebooting). If you watchdog and firmware function properly you shouldn't even have to care if you crash during the kernel power off. Alan
On Thu, Jan 14, 2016 at 12:23:54PM +0100, Ingo Molnar wrote: > * Keerthy <a0393675@ti.com> wrote: > > I tried to simulate the issue. > > > > In the probe function of drivers/thermal/ti-soc-thermal/ti-bandgap.c > > ti_bandgap_probe i call > > > > orderly_poweroff(true); > > > > This is while driver probes are still on going. I observe that > > ret = run_cmd(poweroff_cmd); > > > > ret is a non-zero value and we enter the if condition: > > > > Even after the > > > > emergency_sync(); > > kernel_power_off(); > > > > calls > > > > the console remained active in weird state. > > Now _that_ is clearly an architecture bug that should not be papered over ... No, it's not an architecture bug - it's a platform bug. The ARM architecture has no standard way to control CPU reset or system power, all that is up to the platform. > If kernel_power_off() is called then the system should power off. No > ifs and whens. There definitely are ifs and whens. Only if the platform has support, and when that support works. If the platform does not provide such support, or that support is broken, there's nothing that can be done at the architecture level. Looking at the log given via the message that was referred to in a previous message in this thread, it looks like userspace fails to get to the point of calling into the kernel: 159.636627] rc S c0756104 0 527 1 0x00000000 159.643027] [<c0756104>] (__schedule) from [<c0756778>] (schedule+0x40/0x98) 159.650108] [<c0756778>] (schedule) from [<c0193c50>] (pipe_wait+0x60/0x9c) 159.657102] [<c0193c50>] (pipe_wait) from [<c0193cc0>] (wait_for_partner+0x34/0x 159.664792] [<c0193cc0>] (wait_for_partner) from [<c01947fc>] (fifo_open+0x1a4/0 159.672658] [<c01947fc>] (fifo_open) from [<c018a5c0>] (do_dentry_open+0x1c8/0x3 159.680351] [<c018a5c0>] (do_dentry_open) from [<c019855c>] (do_last+0x64c/0xce0 159.687867] [<c019855c>] (do_last) from [<c019adf8>] (path_openat+0x80/0x608) 159.695036] [<c019adf8>] (path_openat) from [<c019be34>] (do_filp_open+0x2c/0x88 159.702555] [<c019be34>] (do_filp_open) from [<c018b94c>] (do_sys_open+0xfc/0x1c 159.710158] [<c018b94c>] (do_sys_open) from [<c00102a0>] (ret_fast_syscall+0x0/0 and later on, it's still there: 219.253041] rc S c0756104 0 527 1 0x00000000 219.259443] [<c0756104>] (__schedule) from [<c0756778>] (schedule+0x40/0x98) 219.266524] [<c0756778>] (schedule) from [<c0193c50>] (pipe_wait+0x60/0x9c) 219.273516] [<c0193c50>] (pipe_wait) from [<c0193cc0>] (wait_for_partner+0x34/0x 219.281206] [<c0193cc0>] (wait_for_partner) from [<c01947fc>] (fifo_open+0x1a4/0 219.289071] [<c01947fc>] (fifo_open) from [<c018a5c0>] (do_dentry_open+0x1c8/0x3 219.296762] [<c018a5c0>] (do_dentry_open) from [<c019855c>] (do_last+0x64c/0xce0 The 'rc' script in sysvinit/upstart is normally responsible for walking through /etc/rc?.d/* running the scripts in order. It looks like this has wedged, and so it's not getting anywhere near to asking the kernel to shut down. That's even more confirmed by there being no "Power down" message in the log, which is printed by kernel_power_off(). So I'm not convinced that the pointed to log is actually an illustration of the problem that its being discussed here: it looks to me like some other failure, and it looks like 'rc' is stuck trying to open a fifo. It would be nice to see an example of a log where we have proof that kernel_power_off() was called (via the "Power down" message being in the log.) In any case, at the architecture level, if a platform code fails to reboot, we print "Reboot failed -- System halted", disable IRQs and spin. We don't print anything if a platform hasn't provided a "pm_power_off()" hook, or that hook fails though - we just fall back to the generic code which does a do_exit(0) for the caller. I think some people have their power off/reboot stuff as part of their watchdog driver, which can be a loadable module - if the module isn't loaded, then these facilities are not available.
* Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Jan 14, 2016 at 12:23:54PM +0100, Ingo Molnar wrote: > > * Keerthy <a0393675@ti.com> wrote: > > > I tried to simulate the issue. > > > > > > In the probe function of drivers/thermal/ti-soc-thermal/ti-bandgap.c > > > ti_bandgap_probe i call > > > > > > orderly_poweroff(true); > > > > > > This is while driver probes are still on going. I observe that > > > ret = run_cmd(poweroff_cmd); > > > > > > ret is a non-zero value and we enter the if condition: > > > > > > Even after the > > > > > > emergency_sync(); > > > kernel_power_off(); > > > > > > calls > > > > > > the console remained active in weird state. > > > > Now _that_ is clearly an architecture bug that should not be papered over ... > > No, it's not an architecture bug - it's a platform bug. [...] It's an 'architecture bug' in Linux kernel speak: all stuff that is traditionally under arch/*. The 'arch' in that directory name derives from 'architecture'. kernel_power_off() is a traditionally architecture level (not core kernel level and not driver level) code. > [...] The ARM architecture has no standard way to control CPU reset or system > power, all that is up to the platform. ... and platform code is typically part of arch/ as well. FYI, you are making an unnecessarily obtuse argument by insisting on the architecture != platform triviality and you also injected an uncalled for patronizing tone into this discussion by pretending that I don't know that distinction. It's sad. > > If kernel_power_off() is called then the system should power off. No ifs and > > whens. > > There definitely are ifs and whens. Only if the platform has support, and when > that support works. And that is precisely what I meant: in a correctly working kernel, with correctly working hardware, in a correctly working universe, the core kernel expects kernel_power_off() to never 'fail'. As the name suggests. Yes, bugs in user-space, kernel-space, hardware and designed buggy hardware might prevent a reboot - as usual. I.e. I NAK this patch from a core kernel perspective, we don't add such workarounds without a lot more information about why it's the right thing to do. Thanks, Ingo
* One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > > If kernel_power_off() is called then the system should power off. No ifs and > > whens. > > Even if it doesn't the watchdog should kill it. > > That is broken on some platforms on the watchdog side as the > watchdog shuts down during our power off callbacks - because the system > firmware is too stupid to reset the watchdog as it powers back up (so > keeps rebooting). > > If you watchdog and firmware function properly you shouldn't even have to > care if you crash during the kernel power off. That's a good point as well - if the system is 'stuck' for some notion of stuck, then watchdog drivers can help. Here it's unclear whether user-space even called the sys_reboot() system call. Thanks, Ingo
On Fri, Jan 15, 2016 at 11:13:12AM +0100, Ingo Molnar wrote: > FYI, you are making an unnecessarily obtuse argument by insisting on the > architecture != platform triviality and you also injected an uncalled for > patronizing tone into this discussion by pretending that I don't know that > distinction. It's sad. I was trying to be helpful, but your attitude seems to always one of finding some way to turn a reasonable discussion into an ad hominem attack. You've just proven that you're totally impossible to work with. I won't make the mistake of replying to one of your messages again.
On 01/15/2016 12:14 PM, Ingo Molnar wrote: > > * One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > >>> If kernel_power_off() is called then the system should power off. No ifs and >>> whens. >> >> Even if it doesn't the watchdog should kill it. >> >> That is broken on some platforms on the watchdog side as the >> watchdog shuts down during our power off callbacks - because the system >> firmware is too stupid to reset the watchdog as it powers back up (so >> keeps rebooting). >> >> If you watchdog and firmware function properly you shouldn't even have to >> care if you crash during the kernel power off. > > That's a good point as well - if the system is 'stuck' for some notion of stuck, > then watchdog drivers can help. > Seems ARM doesn't have endless loop implemented in machine_power_off() - so, not too much chances for Watchdog to fire. void machine_power_off(void) { local_irq_disable(); smp_send_stop(); if (pm_power_off) pm_power_off(); --- endless loop ? --- or restart ? } [and even if it will be there - 20-30sec is usual timeout for Watchdog and this enough time to burn the system in case of thermal emergency poweroff :(] > Here it's unclear whether user-space even called the sys_reboot() system call. > That's true - original log [1] has Nov 30 11:19:22 [ 5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down [...] Nov 30 11:19:24 [ 7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst Nov 30 11:19:24 INIT: Switching to runlevel: 0 Nov 30 11:19:24 INIT: Sending processes the TERM signal and there are no [ 220.004522] reboot: Power down Also, It's not the first time this part of code is discussed (thermal emergency poweroff) [2], so the good question, as for me, is it really required and safe to use orderly_poweroff() in case of thermal emergency poweroff ([3] as example)? In general, this kind of use case can be simulated using SysRq on any arch - [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000) INIT: version 2.88 booting Starting udev ^^ The issue most probably might happens when system in the process of loading modules So, once modules loading process is started - fire Sysrq "poweroff(o)" [1] http://pastebin.ubuntu.com/14326688/ [2] https://lkml.org/lkml/2012/9/18/577 [3] http://review.omapzoom.org/#/c/34898/
On Fri, Jan 15, 2016 at 03:29:04PM +0200, Grygorii Strashko wrote: > Seems ARM doesn't have endless loop implemented in machine_power_off() - so, > not too much chances for Watchdog to fire. > void machine_power_off(void) > { > local_irq_disable(); > smp_send_stop(); > > if (pm_power_off) > pm_power_off(); > > --- endless loop ? > --- or restart ? > } > [and even if it will be there - 20-30sec is usual timeout for Watchdog > and this enough time to burn the system in case of thermal emergency > poweroff :(] I covered this in my reply to Ingo yesterday. The result is that a failed or unimplemented call drops through to do_exit(0) on behalf of the calling process, terminating that process. However, as I said in that same email, I don't think you're getting anywhere near this code. > That's true - original log [1] has > Nov 30 11:19:22 [ 5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down > [...] > Nov 30 11:19:24 [ 7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst > Nov 30 11:19:24 INIT: Switching to runlevel: 0 > Nov 30 11:19:24 INIT: Sending processes the TERM signal > > and there are no > [ 220.004522] reboot: Power down Right, so things are stuck in userspace, which means the system is still in an active runnable state. As I mentioned (again) in my email, the issue appears to be that the 'rc' script is stuck waiting on a FIFO. The init daemon is trying to do an orderly shutdown. As part of that, it's executing the 'rc' script, which in systems I've seen, runs through a set of scripts in the /etc/rc?.d directory in order, which normally bring up or take down services and perform other sequenced actions. If this script hangs (as it seems to be doing) it won't get to running /sbin/poweroff or similar, and that means machine_power_off() won't be called. > In general, this kind of use case can be simulated using SysRq on any arch > - [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000) > INIT: version 2.88 booting > Starting udev > ^^ The issue most probably might happens when system in the process of > loading modules > So, once modules loading process is started - fire Sysrq "poweroff(o)" This suggests it could be a udev issue - but without knowing what's happening inside sysvinit's scripts, it's hard to know for certain. Adding some debug to the 'rc' script (make sure it works without rebooting or changing the run level, or have a way of restoring the file if it fails to boot) so that it's possible to see what it's doing may be a good idea - the simplest approach may be to just add set -x towards the top of the file - which will make it very noisy.
* Grygorii Strashko <grygorii.strashko@ti.com> wrote: > On 01/15/2016 12:14 PM, Ingo Molnar wrote: > > > > * One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: > > > >>> If kernel_power_off() is called then the system should power off. No ifs and > >>> whens. > >> > >> Even if it doesn't the watchdog should kill it. > >> > >> That is broken on some platforms on the watchdog side as the > >> watchdog shuts down during our power off callbacks - because the system > >> firmware is too stupid to reset the watchdog as it powers back up (so > >> keeps rebooting). > >> > >> If you watchdog and firmware function properly you shouldn't even have to > >> care if you crash during the kernel power off. > > > > That's a good point as well - if the system is 'stuck' for some notion of stuck, > > then watchdog drivers can help. > > > > Seems ARM doesn't have endless loop implemented in machine_power_off() - so, > not too much chances for Watchdog to fire. > void machine_power_off(void) > { > local_irq_disable(); > smp_send_stop(); > > if (pm_power_off) > pm_power_off(); > > --- endless loop ? > --- or restart ? > } > [and even if it will be there - 20-30sec is usual timeout for Watchdog and this > enough time to burn the system in case of thermal emergency poweroff :(] > > > Here it's unclear whether user-space even called the sys_reboot() system call. > > > > That's true - original log [1] has > Nov 30 11:19:22 [ 5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down > [...] > Nov 30 11:19:24 [ 7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst > Nov 30 11:19:24 INIT: Switching to runlevel: 0 > Nov 30 11:19:24 INIT: Sending processes the TERM signal > > and there are no > [ 220.004522] reboot: Power down > > > Also, It's not the first time this part of code is discussed (thermal emergency poweroff) [2], > so the good question, as for me, is it really required and safe to use orderly_poweroff() in > case of thermal emergency poweroff ([3] as example)? > > In general, this kind of use case can be simulated using SysRq on any arch > - [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000) > INIT: version 2.88 booting > Starting udev > ^^ The issue most probably might happens when system in the process of loading modules > So, once modules loading process is started - fire Sysrq "poweroff(o)" So I'd say emergency poweroff should be named accordingly - and the orderly_poweroff() name suggest anything but an emergency, right? So I'd be fine with the following: - introduce a poweroff_emergency() core kernel function call - use it in drivers where it's justified - poweroff_emergency() has a configurable timeout value. If the timeout value is set to 0 then it powers the system off immediately. Functionally it would be mostly equivalent to your current patch (except the '0' immediate poweroff functionality). Thanks, Ingo
Hi Ingo, On Tuesday 19 January 2016 02:36 PM, Ingo Molnar wrote: > > * Grygorii Strashko <grygorii.strashko@ti.com> wrote: > >> On 01/15/2016 12:14 PM, Ingo Molnar wrote: >>> >>> * One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk> wrote: >>> >>>>> If kernel_power_off() is called then the system should power off. No ifs and >>>>> whens. >>>> >>>> Even if it doesn't the watchdog should kill it. >>>> >>>> That is broken on some platforms on the watchdog side as the >>>> watchdog shuts down during our power off callbacks - because the system >>>> firmware is too stupid to reset the watchdog as it powers back up (so >>>> keeps rebooting). >>>> >>>> If you watchdog and firmware function properly you shouldn't even have to >>>> care if you crash during the kernel power off. >>> >>> That's a good point as well - if the system is 'stuck' for some notion of stuck, >>> then watchdog drivers can help. >>> >> >> Seems ARM doesn't have endless loop implemented in machine_power_off() - so, >> not too much chances for Watchdog to fire. >> void machine_power_off(void) >> { >> local_irq_disable(); >> smp_send_stop(); >> >> if (pm_power_off) >> pm_power_off(); >> >> --- endless loop ? >> --- or restart ? >> } >> [and even if it will be there - 20-30sec is usual timeout for Watchdog and this >> enough time to burn the system in case of thermal emergency poweroff :(] >> >>> Here it's unclear whether user-space even called the sys_reboot() system call. >>> >> >> That's true - original log [1] has >> Nov 30 11:19:22 [ 5.942769] thermal thermal_zone3: critical temperature reached(108 C),shutting down >> [...] >> Nov 30 11:19:24 [ 7.387900] ahci 4a140000.sata: flags: 64bit ncq sntf stag pm led clo only pmp pio slum part ccc apst >> Nov 30 11:19:24 INIT: Switching to runlevel: 0 >> Nov 30 11:19:24 INIT: Sending processes the TERM signal >> >> and there are no >> [ 220.004522] reboot: Power down >> >> >> Also, It's not the first time this part of code is discussed (thermal emergency poweroff) [2], >> so the good question, as for me, is it really required and safe to use orderly_poweroff() in >> case of thermal emergency poweroff ([3] as example)? >> >> In general, this kind of use case can be simulated using SysRq on any arch >> - [3.290034] Freeing unused kernel memory: 492K (c0a67000 - c0ae2000) >> INIT: version 2.88 booting >> Starting udev >> ^^ The issue most probably might happens when system in the process of loading modules >> So, once modules loading process is started - fire Sysrq "poweroff(o)" > > So I'd say emergency poweroff should be named accordingly - and the > orderly_poweroff() name suggest anything but an emergency, right? > > So I'd be fine with the following: > > - introduce a poweroff_emergency() core kernel function call > > - use it in drivers where it's justified > > - poweroff_emergency() has a configurable timeout value. If the timeout value is > set to 0 then it powers the system off immediately. > > Functionally it would be mostly equivalent to your current patch (except the '0' > immediate poweroff functionality). Thanks for the suggestion. I will work on this and get back. Best Regards, Keerthy > > Thanks, > > Ingo >
Index: linux/arch/Kconfig =================================================================== --- linux.orig/arch/Kconfig 2016-01-11 15:26:07.732173131 +0530 +++ linux/arch/Kconfig 2016-01-11 15:26:07.728173205 +0530 @@ -37,6 +37,18 @@ def_bool y depends on PERF_EVENTS && HAVE_PERF_EVENTS_NMI && !PPC64 +config SHUTDOWN_BACKUP_DELAY_MS + int "Backup shutdown delay in milli-seconds" + default 0 + help + The number of milliseconds to delay before backup workqueue + executes attempting to poweroff the system after the + orderly_poweroff function has failed to complete. + + If set to 0, the backup workqueue is not active. The value + should be conservatively configured based on userspace latencies + expected for a given system. + config KPROBES bool "Kprobes" depends on MODULES Index: linux/kernel/reboot.c =================================================================== --- linux.orig/kernel/reboot.c 2016-01-11 15:26:07.732173131 +0530 +++ linux/kernel/reboot.c 2016-01-11 15:38:33.502341511 +0530 @@ -424,6 +424,38 @@ return ret; } +#if CONFIG_SHUTDOWN_BACKUP_DELAY_MS + +/** + * shutdown_backup_func - shutdown backup work after a known delay + * @work: work_struct associated with the backup shutdown function + * + * In system bootup sequences like those in the middle of driver probes being + * initiated, userspace will be unable to power off the system in a clean + * manner and leaves the system in a critical state. + * In such scenarios, this backup workqueue forces a shutdown of the system + * when orderly shutdown is not successful after a configurable time period. + */ +static void shutdown_backup_func(struct work_struct *work) +{ + pr_warn("Orderly_poweroff has failed! Attempting kernel_power_off\n"); + kernel_power_off(); + + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); + emergency_restart(); +} + +static DECLARE_DELAYED_WORK(bkup_shutdown_work, shutdown_backup_func); + +static inline void setup_backup_shutdown_workqueue(void) +{ + schedule_delayed_work(&bkup_shutdown_work, + msecs_to_jiffies(CONFIG_SHUTDOWN_BACKUP_DELAY_MS)); +} +#else +static inline void setup_backup_shutdown_workqueue(void) { } +#endif + static int __orderly_poweroff(bool force) { int ret; @@ -442,6 +474,12 @@ kernel_power_off(); } + /* + * Schedule a backup work function to execute after a known time + * when orderly shutdown fails. + */ + setup_backup_shutdown_workqueue(); + return ret; }
orderly_poweroff is triggered when a graceful shutdown of system is desired. This may be used in many critical states of the kernel such as when subsystems detects conditions such as critical temperature conditions. However, in certain conditions in system boot up sequences like those in the middle of driver probes being initiated, userspace will be unable to power off the system in a clean manner and leaves the system in a critical state. In cases like these, the /sbin/poweroff will return success (having forked off to attempt powering off the system. However, the system overall will fail to completely poweroff (since other modules will be probed) and the system is still functional with no userspace (since that would have shut itself off). However, there is no clean way of detecting such failure of userspace powering off the system. In such scenarios, it is necessary for a backup workqueue to be able to force a shutdown of the system when orderly shutdown is not successful after a configurable time period. Signed-off-by: Keerthy <j-keerthy@ti.com> Suggested-by: Eduardo Valentin <edubezval@gmail.com> Reported-by: Nishanth Menon <nm@ti.com> --- Links to previous discussion can be found here: http://www.spinics.net/lists/linux-omap/msg124925.html Boot tested on DRA7. changes in v2: * Changed #ifdef to #if CONFIG_SHUTDOWN_BACKUP_DELAY_MS arch/Kconfig | 7 +++++++ kernel/reboot.c | 23 ++++++++++++++++++----- 2 files changed, 25 insertions(+), 5 deletions(-)