Message ID | 1490941820-13511-1-git-send-email-j-keerthy@ti.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Eduardo Valentin |
Headers | show |
Hey, On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: > orderly_poweroff is triggered when a graceful shutdown > of system is desired. This may be used in many critical states of the > kernel such as when subsystems detects conditions such as critical > temperature conditions. However, in certain conditions in system > boot up sequences like those in the middle of driver probes being > initiated, userspace will be unable to power off the system in a clean > manner and leaves the system in a critical state. In cases like these, > the /sbin/poweroff will return success (having forked off to attempt > powering off the system. However, the system overall will fail to > completely poweroff (since other modules will be probed) and the system > is still functional with no userspace (since that would have shut itself > off). OK... This seams to me, still a corner case supposed to be fixed at orderly_power_off, not at thermal. But.. > > However, there is no clean way of detecting such failure of userspace > powering off the system. In such scenarios, it is necessary for a backup > workqueue to be able to force a shutdown of the system when orderly > shutdown is not successful after a configurable time period. > Given that system running hot is a thermal issue, I guess we care more on this matter then.. > Reported-by: Nishanth Menon <nm@ti.com> > Signed-off-by: Keerthy <j-keerthy@ti.com> > --- > drivers/thermal/Kconfig | 13 +++++++++++++ > drivers/thermal/thermal_core.c | 42 ++++++++++++++++++++++++++++++++++++++++++ > 2 files changed, 55 insertions(+) > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 0a16cf4..4cc55f9 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -15,6 +15,19 @@ menuconfig THERMAL > > if THERMAL > > +config THERMAL_EMERGENCY_POWEROFF_DELAY_MS > + int "Emergency poweroff delay in milli-seconds" > + depends on THERMAL > + default 0 > + help > + The number of milliseconds to delay before emergency > + poweroff kicks in. The delay should be carefully profiled > + so as to give adequate time for orderly_poweroff. In case > + of failure of an orderly_poweroff the emergency poweroff > + kicks in after the delay has elapsed and shuts down the system. > + > + If set to 0 poweroff will happen immediately. > + > config THERMAL_HWMON > bool > prompt "Expose thermal sensors as hwmon device" > diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c > index 11f0675..dc7fdd4 100644 > --- a/drivers/thermal/thermal_core.c > +++ b/drivers/thermal/thermal_core.c > @@ -322,6 +322,47 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, > def_governor->throttle(tz, trip); > } > > +/** > + * emergency_poweroff_func - emergency poweroff work after a known delay > + * @work: work_struct associated with the emergency poweroff function > + * > + * This function is called in very critical situations to force > + * a kernel poweroff after a configurable timeout value. > + */ > +static void emergency_poweroff_func(struct work_struct *work) > +{ > + /** > + * We have reached here after the emergency thermal shutdown > + * Waiting period has expired. This means orderly_poweroff has > + * not been able to shut off the system for some reason. > + * Try to shut down the system immediately using pm_power_off > + * if populated > + */ The above is not a kernel doc entry... > + pr_warn("Attempting kernel_power_off\n"); > + if (pm_power_off) > + pm_power_off(); Why not calling kernel_power_off() directly instead? That is what is called by orderly power off in case it fails, which seams to be the missing part when user land returns success, and therefore we don't call kernel_power_off(). That path goes through the machine_power_off(), which seams to be the default for pm_power_off() anyway. kernel_power_off() handles the power off system call too. > + > + /** not a kernel doc entry... > + * Worst of the worst case trigger emergency restart > + */ > + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); > + emergency_restart(); > +} > + > +static DECLARE_DELAYED_WORK(emergency_poweroff_work, emergency_poweroff_func); > + > +/** > + * emergency_poweroff - Trigger an emergency system poweroff > + * > + * This may be called from any critical situation to trigger a system shutdown > + * after a known period of time. By default the delay is 0 millisecond > + */ > +void thermal_emergency_poweroff(void) > +{ > + schedule_delayed_work(&emergency_poweroff_work, > + msecs_to_jiffies(CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS)); > +} > + > static void handle_critical_trips(struct thermal_zone_device *tz, > int trip, enum thermal_trip_type trip_type) > { > @@ -343,6 +384,7 @@ static void handle_critical_trips(struct thermal_zone_device *tz, > "critical temperature reached(%d C),shutting down\n", > tz->temperature / 1000); > orderly_poweroff(true); > + thermal_emergency_poweroff(); Shouldn't we start count the timeout before calling orderly_poweroff? > } > } > > -- > 1.9.1 >
On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: > Hey, > > On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: >> orderly_poweroff is triggered when a graceful shutdown >> of system is desired. This may be used in many critical states of the >> kernel such as when subsystems detects conditions such as critical >> temperature conditions. However, in certain conditions in system >> boot up sequences like those in the middle of driver probes being >> initiated, userspace will be unable to power off the system in a clean >> manner and leaves the system in a critical state. In cases like these, >> the /sbin/poweroff will return success (having forked off to attempt >> powering off the system. However, the system overall will fail to >> completely poweroff (since other modules will be probed) and the system >> is still functional with no userspace (since that would have shut itself >> off). > > OK... This seams to me, still a corner case supposed to be fixed at > orderly_power_off, not at thermal. But.. > >> >> However, there is no clean way of detecting such failure of userspace >> powering off the system. In such scenarios, it is necessary for a backup >> workqueue to be able to force a shutdown of the system when orderly >> shutdown is not successful after a configurable time period. >> > > Given that system running hot is a thermal issue, I guess we care more > on this matter then.. Yes! > >> Reported-by: Nishanth Menon <nm@ti.com> >> Signed-off-by: Keerthy <j-keerthy@ti.com> >> --- >> drivers/thermal/Kconfig | 13 +++++++++++++ >> drivers/thermal/thermal_core.c | 42 ++++++++++++++++++++++++++++++++++++++++++ >> 2 files changed, 55 insertions(+) >> >> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig >> index 0a16cf4..4cc55f9 100644 >> --- a/drivers/thermal/Kconfig >> +++ b/drivers/thermal/Kconfig >> @@ -15,6 +15,19 @@ menuconfig THERMAL >> >> if THERMAL >> >> +config THERMAL_EMERGENCY_POWEROFF_DELAY_MS >> + int "Emergency poweroff delay in milli-seconds" >> + depends on THERMAL >> + default 0 >> + help >> + The number of milliseconds to delay before emergency >> + poweroff kicks in. The delay should be carefully profiled >> + so as to give adequate time for orderly_poweroff. In case >> + of failure of an orderly_poweroff the emergency poweroff >> + kicks in after the delay has elapsed and shuts down the system. >> + >> + If set to 0 poweroff will happen immediately. >> + >> config THERMAL_HWMON >> bool >> prompt "Expose thermal sensors as hwmon device" >> diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c >> index 11f0675..dc7fdd4 100644 >> --- a/drivers/thermal/thermal_core.c >> +++ b/drivers/thermal/thermal_core.c >> @@ -322,6 +322,47 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, >> def_governor->throttle(tz, trip); >> } >> >> +/** >> + * emergency_poweroff_func - emergency poweroff work after a known delay >> + * @work: work_struct associated with the emergency poweroff function >> + * >> + * This function is called in very critical situations to force >> + * a kernel poweroff after a configurable timeout value. >> + */ >> +static void emergency_poweroff_func(struct work_struct *work) >> +{ >> + /** >> + * We have reached here after the emergency thermal shutdown >> + * Waiting period has expired. This means orderly_poweroff has >> + * not been able to shut off the system for some reason. >> + * Try to shut down the system immediately using pm_power_off >> + * if populated >> + */ > > The above is not a kernel doc entry... I will fix that. > >> + pr_warn("Attempting kernel_power_off\n"); >> + if (pm_power_off) >> + pm_power_off(); > > Why not calling kernel_power_off() directly instead? That is what is called by orderly > power off in case it fails, which seams to be the missing part when > user land returns success, and therefore we don't call > kernel_power_off(). That path goes through the machine_power_off(), > which seams to be the default for pm_power_off() anyway. > > kernel_power_off() handles the power off system call too. Yes. This is after orderly_poweroff fails so i felt why go through kernel_power_off and directly call pm_power_off which directly pulls out the power plug. This is in dire straits situation. Hence preferred to call the last piece directly. > >> + >> + /** > > not a kernel doc entry... Okay. > >> + * Worst of the worst case trigger emergency restart >> + */ >> + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); >> + emergency_restart(); >> +} >> + >> +static DECLARE_DELAYED_WORK(emergency_poweroff_work, emergency_poweroff_func); >> + >> +/** >> + * emergency_poweroff - Trigger an emergency system poweroff >> + * >> + * This may be called from any critical situation to trigger a system shutdown >> + * after a known period of time. By default the delay is 0 millisecond >> + */ >> +void thermal_emergency_poweroff(void) >> +{ >> + schedule_delayed_work(&emergency_poweroff_work, >> + msecs_to_jiffies(CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS)); >> +} >> + >> static void handle_critical_trips(struct thermal_zone_device *tz, >> int trip, enum thermal_trip_type trip_type) >> { >> @@ -343,6 +384,7 @@ static void handle_critical_trips(struct thermal_zone_device *tz, >> "critical temperature reached(%d C),shutting down\n", >> tz->temperature / 1000); >> orderly_poweroff(true); >> + thermal_emergency_poweroff(); > > Shouldn't we start count the timeout before calling orderly_poweroff? Okay yes. That makes more sense. Queue the emergency function, start the countdown and immediately call the orderly_poweroff. I will fix the above comments and send a v2. I still want to go with pm_power_off over kernel_poweroff as we have already elapsed the time out and the first thing we want is to shut off the SoC! Let me know. > >> } >> } >> >> -- >> 1.9.1 >>
On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: > > On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: > > > > Hey, > > > > On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: > > > > > > orderly_poweroff is triggered when a graceful shutdown > > > of system is desired. This may be used in many critical states of > > > the > > > kernel such as when subsystems detects conditions such as > > > critical > > > temperature conditions. However, in certain conditions in system > > > boot up sequences like those in the middle of driver probes being > > > initiated, userspace will be unable to power off the system in a > > > clean > > > manner and leaves the system in a critical state. In cases like > > > these, > > > the /sbin/poweroff will return success (having forked off to > > > attempt > > > powering off the system. However, the system overall will fail to > > > completely poweroff (since other modules will be probed) and the > > > system > > > is still functional with no userspace (since that would have shut > > > itself > > > off). > > OK... This seams to me, still a corner case supposed to be fixed at > > orderly_power_off, not at thermal. But.. > > > > > > > > > > > However, there is no clean way of detecting such failure of > > > userspace > > > powering off the system. In such scenarios, it is necessary for a > > > backup > > > workqueue to be able to force a shutdown of the system when > > > orderly > > > shutdown is not successful after a configurable time period. > > > > > Given that system running hot is a thermal issue, I guess we care > > more > > on this matter then.. > Yes! > I just read this thread again https://patchwork.kernel.org/patch/802458 1/ to recall the previous discussion. https://patchwork.kernel.org/patch/8149891/ https://patchwork.kernel.org/patch/8149861/ should be the solution made based on Ingo' suggestion, right? And to me, this sounds like the right direction to go, thermal does not need a back up shutdown solution, it just needs a kernel function call which guarantees the system can be shutdown/reboot immediately. is there any reason that patch 1/2 is not accepted? thanks, rui > > > > > > > > > > Reported-by: Nishanth Menon <nm@ti.com> > > > Signed-off-by: Keerthy <j-keerthy@ti.com> > > > --- > > > drivers/thermal/Kconfig | 13 +++++++++++++ > > > drivers/thermal/thermal_core.c | 42 > > > ++++++++++++++++++++++++++++++++++++++++++ > > > 2 files changed, 55 insertions(+) > > > > > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > > > index 0a16cf4..4cc55f9 100644 > > > --- a/drivers/thermal/Kconfig > > > +++ b/drivers/thermal/Kconfig > > > @@ -15,6 +15,19 @@ menuconfig THERMAL > > > > > > if THERMAL > > > > > > +config THERMAL_EMERGENCY_POWEROFF_DELAY_MS > > > + int "Emergency poweroff delay in milli-seconds" > > > + depends on THERMAL > > > + default 0 > > > + help > > > + The number of milliseconds to delay before emergency > > > + poweroff kicks in. The delay should be carefully > > > profiled > > > + so as to give adequate time for orderly_poweroff. In > > > case > > > + of failure of an orderly_poweroff the emergency > > > poweroff > > > + kicks in after the delay has elapsed and shuts down > > > the system. > > > + > > > + If set to 0 poweroff will happen immediately. > > > + > > > config THERMAL_HWMON > > > bool > > > prompt "Expose thermal sensors as hwmon device" > > > diff --git a/drivers/thermal/thermal_core.c > > > b/drivers/thermal/thermal_core.c > > > index 11f0675..dc7fdd4 100644 > > > --- a/drivers/thermal/thermal_core.c > > > +++ b/drivers/thermal/thermal_core.c > > > @@ -322,6 +322,47 @@ static void handle_non_critical_trips(struct > > > thermal_zone_device *tz, > > > def_governor->throttle(tz, trip); > > > } > > > > > > +/** > > > + * emergency_poweroff_func - emergency poweroff work after a > > > known delay > > > + * @work: work_struct associated with the emergency poweroff > > > function > > > + * > > > + * This function is called in very critical situations to force > > > + * a kernel poweroff after a configurable timeout value. > > > + */ > > > +static void emergency_poweroff_func(struct work_struct *work) > > > +{ > > > + /** > > > + * We have reached here after the emergency thermal > > > shutdown > > > + * Waiting period has expired. This means > > > orderly_poweroff has > > > + * not been able to shut off the system for some reason. > > > + * Try to shut down the system immediately using > > > pm_power_off > > > + * if populated > > > + */ > > The above is not a kernel doc entry... > I will fix that. > > > > > > > > > > > + pr_warn("Attempting kernel_power_off\n"); > > > + if (pm_power_off) > > > + pm_power_off(); > > Why not calling kernel_power_off() directly instead? That is what > > is called by orderly > > power off in case it fails, which seams to be the missing part > > when > > user land returns success, and therefore we don't call > > kernel_power_off(). That path goes through the machine_power_off(), > > which seams to be the default for pm_power_off() anyway. > > > > kernel_power_off() handles the power off system call too. > Yes. This is after orderly_poweroff fails so i felt why go through > kernel_power_off and directly call pm_power_off which directly pulls > out > the power plug. This is in dire straits situation. Hence preferred to > call the last piece directly. > > > > > > > > > > > + > > > + /** > > not a kernel doc entry... > Okay. > > > > > > > > > > > + * Worst of the worst case trigger emergency restart > > > + */ > > > + pr_warn("kernel_power_off has failed! Attempting > > > emergency_restart\n"); > > > + emergency_restart(); > > > +} > > > + > > > +static DECLARE_DELAYED_WORK(emergency_poweroff_work, > > > emergency_poweroff_func); > > > + > > > +/** > > > + * emergency_poweroff - Trigger an emergency system poweroff > > > + * > > > + * This may be called from any critical situation to trigger a > > > system shutdown > > > + * after a known period of time. By default the delay is 0 > > > millisecond > > > + */ > > > +void thermal_emergency_poweroff(void) > > > +{ > > > + schedule_delayed_work(&emergency_poweroff_work, > > > + msecs_to_jiffies(CONFIG_THERMAL_EM > > > ERGENCY_POWEROFF_DELAY_MS)); > > > +} > > > + > > > static void handle_critical_trips(struct thermal_zone_device > > > *tz, > > > int trip, enum > > > thermal_trip_type trip_type) > > > { > > > @@ -343,6 +384,7 @@ static void handle_critical_trips(struct > > > thermal_zone_device *tz, > > > "critical temperature reached(%d > > > C),shutting down\n", > > > tz->temperature / 1000); > > > orderly_poweroff(true); > > > + thermal_emergency_poweroff(); > > Shouldn't we start count the timeout before calling > > orderly_poweroff? > Okay yes. That makes more sense. Queue the emergency function, start > the > countdown and immediately call the orderly_poweroff. I will fix the > above comments and send a v2. I still want to go with pm_power_off > over > kernel_poweroff as we have already elapsed the time out and the first > thing we want is to shut off the SoC! Let me know. > > > > > > > > > > > > } > > > } > > >
On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: > On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: >> >> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: >>> >>> Hey, >>> >>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: >>>> >>>> orderly_poweroff is triggered when a graceful shutdown >>>> of system is desired. This may be used in many critical states of >>>> the >>>> kernel such as when subsystems detects conditions such as >>>> critical >>>> temperature conditions. However, in certain conditions in system >>>> boot up sequences like those in the middle of driver probes being >>>> initiated, userspace will be unable to power off the system in a >>>> clean >>>> manner and leaves the system in a critical state. In cases like >>>> these, >>>> the /sbin/poweroff will return success (having forked off to >>>> attempt >>>> powering off the system. However, the system overall will fail to >>>> completely poweroff (since other modules will be probed) and the >>>> system >>>> is still functional with no userspace (since that would have shut >>>> itself >>>> off). >>> OK... This seams to me, still a corner case supposed to be fixed at >>> orderly_power_off, not at thermal. But.. >>> >>>> >>>> >>>> However, there is no clean way of detecting such failure of >>>> userspace >>>> powering off the system. In such scenarios, it is necessary for a >>>> backup >>>> workqueue to be able to force a shutdown of the system when >>>> orderly >>>> shutdown is not successful after a configurable time period. >>>> >>> Given that system running hot is a thermal issue, I guess we care >>> more >>> on this matter then.. >> Yes! >> > I just read this thread again https://patchwork.kernel.org/patch/802458 > 1/ to recall the previous discussion. > > https://patchwork.kernel.org/patch/8149891/ > https://patchwork.kernel.org/patch/8149861/ > should be the solution made based on Ingo' suggestion, right? > > And to me, this sounds like the right direction to go, thermal does not > need a back up shutdown solution, it just needs a kernel function call > which guarantees the system can be shutdown/reboot immediately. > > is there any reason that patch 1/2 is not accepted? Zhang, http://www.serverphorums.com/read.php?12,1400964 I got a NAK from Alan and was given this direction on a thermal_poweroff which is more or less what is done in this patch. Thanks, Keerthy > > thanks, > rui >>> >>> >>>> >>>> Reported-by: Nishanth Menon <nm@ti.com> >>>> Signed-off-by: Keerthy <j-keerthy@ti.com> >>>> --- >>>> drivers/thermal/Kconfig | 13 +++++++++++++ >>>> drivers/thermal/thermal_core.c | 42 >>>> ++++++++++++++++++++++++++++++++++++++++++ >>>> 2 files changed, 55 insertions(+) >>>> >>>> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig >>>> index 0a16cf4..4cc55f9 100644 >>>> --- a/drivers/thermal/Kconfig >>>> +++ b/drivers/thermal/Kconfig >>>> @@ -15,6 +15,19 @@ menuconfig THERMAL >>>> >>>> if THERMAL >>>> >>>> +config THERMAL_EMERGENCY_POWEROFF_DELAY_MS >>>> + int "Emergency poweroff delay in milli-seconds" >>>> + depends on THERMAL >>>> + default 0 >>>> + help >>>> + The number of milliseconds to delay before emergency >>>> + poweroff kicks in. The delay should be carefully >>>> profiled >>>> + so as to give adequate time for orderly_poweroff. In >>>> case >>>> + of failure of an orderly_poweroff the emergency >>>> poweroff >>>> + kicks in after the delay has elapsed and shuts down >>>> the system. >>>> + >>>> + If set to 0 poweroff will happen immediately. >>>> + >>>> config THERMAL_HWMON >>>> bool >>>> prompt "Expose thermal sensors as hwmon device" >>>> diff --git a/drivers/thermal/thermal_core.c >>>> b/drivers/thermal/thermal_core.c >>>> index 11f0675..dc7fdd4 100644 >>>> --- a/drivers/thermal/thermal_core.c >>>> +++ b/drivers/thermal/thermal_core.c >>>> @@ -322,6 +322,47 @@ static void handle_non_critical_trips(struct >>>> thermal_zone_device *tz, >>>> def_governor->throttle(tz, trip); >>>> } >>>> >>>> +/** >>>> + * emergency_poweroff_func - emergency poweroff work after a >>>> known delay >>>> + * @work: work_struct associated with the emergency poweroff >>>> function >>>> + * >>>> + * This function is called in very critical situations to force >>>> + * a kernel poweroff after a configurable timeout value. >>>> + */ >>>> +static void emergency_poweroff_func(struct work_struct *work) >>>> +{ >>>> + /** >>>> + * We have reached here after the emergency thermal >>>> shutdown >>>> + * Waiting period has expired. This means >>>> orderly_poweroff has >>>> + * not been able to shut off the system for some reason. >>>> + * Try to shut down the system immediately using >>>> pm_power_off >>>> + * if populated >>>> + */ >>> The above is not a kernel doc entry... >> I will fix that. >> >>> >>> >>>> >>>> + pr_warn("Attempting kernel_power_off\n"); >>>> + if (pm_power_off) >>>> + pm_power_off(); >>> Why not calling kernel_power_off() directly instead? That is what >>> is called by orderly >>> power off in case it fails, which seams to be the missing part >>> when >>> user land returns success, and therefore we don't call >>> kernel_power_off(). That path goes through the machine_power_off(), >>> which seams to be the default for pm_power_off() anyway. >>> >>> kernel_power_off() handles the power off system call too. >> Yes. This is after orderly_poweroff fails so i felt why go through >> kernel_power_off and directly call pm_power_off which directly pulls >> out >> the power plug. This is in dire straits situation. Hence preferred to >> call the last piece directly. >> >>> >>> >>>> >>>> + >>>> + /** >>> not a kernel doc entry... >> Okay. >> >>> >>> >>>> >>>> + * Worst of the worst case trigger emergency restart >>>> + */ >>>> + pr_warn("kernel_power_off has failed! Attempting >>>> emergency_restart\n"); >>>> + emergency_restart(); >>>> +} >>>> + >>>> +static DECLARE_DELAYED_WORK(emergency_poweroff_work, >>>> emergency_poweroff_func); >>>> + >>>> +/** >>>> + * emergency_poweroff - Trigger an emergency system poweroff >>>> + * >>>> + * This may be called from any critical situation to trigger a >>>> system shutdown >>>> + * after a known period of time. By default the delay is 0 >>>> millisecond >>>> + */ >>>> +void thermal_emergency_poweroff(void) >>>> +{ >>>> + schedule_delayed_work(&emergency_poweroff_work, >>>> + msecs_to_jiffies(CONFIG_THERMAL_EM >>>> ERGENCY_POWEROFF_DELAY_MS)); >>>> +} >>>> + >>>> static void handle_critical_trips(struct thermal_zone_device >>>> *tz, >>>> int trip, enum >>>> thermal_trip_type trip_type) >>>> { >>>> @@ -343,6 +384,7 @@ static void handle_critical_trips(struct >>>> thermal_zone_device *tz, >>>> "critical temperature reached(%d >>>> C),shutting down\n", >>>> tz->temperature / 1000); >>>> orderly_poweroff(true); >>>> + thermal_emergency_poweroff(); >>> Shouldn't we start count the timeout before calling >>> orderly_poweroff? >> Okay yes. That makes more sense. Queue the emergency function, start >> the >> countdown and immediately call the orderly_poweroff. I will fix the >> above comments and send a v2. I still want to go with pm_power_off >> over >> kernel_poweroff as we have already elapsed the time out and the first >> thing we want is to shut off the SoC! Let me know. >> >> >>> >>> >>>> >>>> } >>>> } >>>>
Keerthy, On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: > > > On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: > > On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: > >> > >> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: > >>> > >>> Hey, > >>> > >>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: > >>>> > >>>> off). <cut> > >>> OK... This seams to me, still a corner case supposed to be fixed at > >>> orderly_power_off, not at thermal. But.. > >>> ^^^ Then again, this must be fixed not at thermal core. And re-reading the history of this thread, this seams to be really something broken at OMAP/DRA7, as mentioned in previous messages. That is probably why you are pushing for pm_power_off(), which seams to be the one that works for you, pulling the plug correctly (DRA7). > >>>> > >>>> > >>>> However, there is no clean way of detecting such failure of > >>>> userspace > >>>> powering off the system. In such scenarios, it is necessary for a > >>>> backup > >>>> workqueue to be able to force a shutdown of the system when > >>>> orderly > >>>> shutdown is not successful after a configurable time period. > >>>> > >>> Given that system running hot is a thermal issue, I guess we care > >>> more > >>> on this matter then.. > >> Yes! > >> > > I just read this thread again https://patchwork.kernel.org/patch/802458 > > 1/ to recall the previous discussion. > > > > https://patchwork.kernel.org/patch/8149891/ > > https://patchwork.kernel.org/patch/8149861/ > > should be the solution made based on Ingo' suggestion, right? > > > > And to me, this sounds like the right direction to go, thermal does not > > need a back up shutdown solution, it just needs a kernel function call > > which guarantees the system can be shutdown/reboot immediately. > > > > is there any reason that patch 1/2 is not accepted? > > Zhang, > > http://www.serverphorums.com/read.php?12,1400964 > > I got a NAK from Alan and was given this direction on a thermal_poweroff > which is more or less what is done in this patch. > Actually, Alan's suggestion is more for you to define a thermal_poweroff() that can be defined per architecture. Also, please, keep track of your patch versions and also do copy everybody who has stated their opinion on previous discussions. These patches must have Ingo, Alan, and RMK copied too. In this way we avoid loosing track of what has been suggested and we also converge faster to something everybody (or most of us) agree. Next version, please, fix that. To me, thermal core needs a function that simply powers off the system. No timeouts, delayed works, backups, etc. Simple and straight. The idea of having a per architecture implementation, as per Alan's suggestion, makes sense to me too. Having something different from pm_power_off(), specific to thermal, might also give the opportunity to save the power off reason. BR, Eduardo Valentin
On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote: > Keerthy, > > On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: >>> On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: >>>> >>>> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: >>>>> >>>>> Hey, >>>>> >>>>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: >>>>>> >>>>>> off). > > <cut> > >>>>> OK... This seams to me, still a corner case supposed to be fixed at >>>>> orderly_power_off, not at thermal. But.. >>>>> > > ^^^ Then again, this must be fixed not at thermal core. And re-reading > the history of this thread, this seams to be really something broken at > OMAP/DRA7, as mentioned in previous messages. That is probably why you > are pushing for pm_power_off(), which seams to be the one that works for > you, pulling the plug correctly (DRA7). > >>>>>> >>>>>> >>>>>> However, there is no clean way of detecting such failure of >>>>>> userspace >>>>>> powering off the system. In such scenarios, it is necessary for a >>>>>> backup >>>>>> workqueue to be able to force a shutdown of the system when >>>>>> orderly >>>>>> shutdown is not successful after a configurable time period. >>>>>> >>>>> Given that system running hot is a thermal issue, I guess we care >>>>> more >>>>> on this matter then.. >>>> Yes! >>>> >>> I just read this thread again https://patchwork.kernel.org/patch/802458 >>> 1/ to recall the previous discussion. >>> >>> https://patchwork.kernel.org/patch/8149891/ >>> https://patchwork.kernel.org/patch/8149861/ >>> should be the solution made based on Ingo' suggestion, right? >>> >>> And to me, this sounds like the right direction to go, thermal does not >>> need a back up shutdown solution, it just needs a kernel function call >>> which guarantees the system can be shutdown/reboot immediately. >>> >>> is there any reason that patch 1/2 is not accepted? >> >> Zhang, >> >> http://www.serverphorums.com/read.php?12,1400964 >> >> I got a NAK from Alan and was given this direction on a thermal_poweroff >> which is more or less what is done in this patch. >> > > > Actually, Alan's suggestion is more for you to define a > thermal_poweroff() that can be defined per architecture. > > Also, please, keep track of your patch versions and also do copy > everybody who has stated their opinion on previous discussions. These > patches must have Ingo, Alan, and RMK copied too. In this way we avoid > loosing track of what has been suggested and we also converge faster to > something everybody (or most of us) agree. Next version, please, fix > that. Sure. This was resurrected from last year. I will add the links to previous discussions. my bad. > > > To me, thermal core needs a function that simply powers off the system. > No timeouts, delayed works, backups, etc. Simple and straight. You mean replacing orderly_power_off during critical trip point cross over with a thermal specific thermal_poweroff function that ensures that hardware is indeed shut off? > > The idea of having a per architecture implementation, as per Alan's > suggestion, makes sense to me too. Having something different from > pm_power_off(), specific to thermal, might also give the opportunity to > save the power off reason. I did not get the 'save the power off reason' point. Care to explain more? > > BR, > > Eduardo Valentin >
On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote: > Keerthy, > > On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: >>> On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: >>>> >>>> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: >>>>> >>>>> Hey, >>>>> >>>>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: >>>>>> >>>>>> off). > > <cut> > >>>>> OK... This seams to me, still a corner case supposed to be fixed at >>>>> orderly_power_off, not at thermal. But.. >>>>> > > ^^^ Then again, this must be fixed not at thermal core. And re-reading > the history of this thread, this seams to be really something broken at > OMAP/DRA7, as mentioned in previous messages. That is probably why you > are pushing for pm_power_off(), which seams to be the one that works for > you, pulling the plug correctly (DRA7). Zhang/Eduardo, OMAP5/DRA7 is one case. I believe i this is the root cause of this failure. thermal_zone_device_check --> thermal_zone_device_update --> handle_thermal_trip --> handle_critical_trips --> orderly_poweroff The above sequence happens every 250/500 mS based on the configuration. The orderly_poweroff function is getting called every 250/500 mS and i see with a full fledged nfs file system it takes at least 5-10 Seconds to shutdown and during that time we bombard with orderly_poweroff calls multiple times due to the thermal_zone_device_check triggering periodically. To confirm that i made sure that handle_critical_trips calls orderly_poweroff only once and i no longer see the failure on DRA72-EVM board. So IMHO once we get to handle_critical_trips case where we do orderly_poweroff we need to do the following: 1) Make sure that orderly_poweroff is called only once. 2) Cancel all the scheduled work queues to monitor the temperature as we have already reached a point of shutting down the system. Let me know your thoughts on this. Best Regards, Keerthy > >>>>>> >>>>>> >>>>>> However, there is no clean way of detecting such failure of >>>>>> userspace >>>>>> powering off the system. In such scenarios, it is necessary for a >>>>>> backup >>>>>> workqueue to be able to force a shutdown of the system when >>>>>> orderly >>>>>> shutdown is not successful after a configurable time period. >>>>>> >>>>> Given that system running hot is a thermal issue, I guess we care >>>>> more >>>>> on this matter then.. >>>> Yes! >>>> >>> I just read this thread again https://patchwork.kernel.org/patch/802458 >>> 1/ to recall the previous discussion. >>> >>> https://patchwork.kernel.org/patch/8149891/ >>> https://patchwork.kernel.org/patch/8149861/ >>> should be the solution made based on Ingo' suggestion, right? >>> >>> And to me, this sounds like the right direction to go, thermal does not >>> need a back up shutdown solution, it just needs a kernel function call >>> which guarantees the system can be shutdown/reboot immediately. >>> >>> is there any reason that patch 1/2 is not accepted? >> >> Zhang, >> >> http://www.serverphorums.com/read.php?12,1400964 >> >> I got a NAK from Alan and was given this direction on a thermal_poweroff >> which is more or less what is done in this patch. >> > > > Actually, Alan's suggestion is more for you to define a > thermal_poweroff() that can be defined per architecture. > > Also, please, keep track of your patch versions and also do copy > everybody who has stated their opinion on previous discussions. These > patches must have Ingo, Alan, and RMK copied too. In this way we avoid > loosing track of what has been suggested and we also converge faster to > something everybody (or most of us) agree. Next version, please, fix > that. > > > To me, thermal core needs a function that simply powers off the system. > No timeouts, delayed works, backups, etc. Simple and straight. > > The idea of having a per architecture implementation, as per Alan's > suggestion, makes sense to me too. Having something different from > pm_power_off(), specific to thermal, might also give the opportunity to > save the power off reason. > > BR, > > Eduardo Valentin >
On Wed, 2017-04-12 at 13:25 +0530, Keerthy wrote: > > On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote: > > > > Keerthy, > > > > On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: > > > > > > > > > > > > On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: > > > > > > > > On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: > > > > > > > > > > > > > > > On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: > > > > > > > > > > > > > > > > > > Hey, > > > > > > > > > > > > On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: > > > > > > > > > > > > > > > > > > > > > off). > > <cut> > > > > > > > > > > > > > > > > > > > > > > > > > > OK... This seams to me, still a corner case supposed to be > > > > > > fixed at > > > > > > orderly_power_off, not at thermal. But.. > > > > > > > > ^^^ Then again, this must be fixed not at thermal core. And re- > > reading > > the history of this thread, this seams to be really something > > broken at > > OMAP/DRA7, as mentioned in previous messages. That is probably why > > you > > are pushing for pm_power_off(), which seams to be the one that > > works for > > you, pulling the plug correctly (DRA7). > Zhang/Eduardo, > > OMAP5/DRA7 is one case. > > I believe i this is the root cause of this failure. > > thermal_zone_device_check --> thermal_zone_device_update --> > handle_thermal_trip --> handle_critical_trips --> orderly_poweroff > > The above sequence happens every 250/500 mS based on the > configuration. > The orderly_poweroff function is getting called every 250/500 mS and > i > see with a full fledged nfs file system it takes at least 5-10 > Seconds > to shutdown and during that time we bombard with orderly_poweroff > calls > multiple times due to the thermal_zone_device_check triggering > periodically. > > To confirm that i made sure that handle_critical_trips calls > orderly_poweroff only once and i no longer see the failure on DRA72- > EVM > board. > Nice catch! > So IMHO once we get to handle_critical_trips case where we do > orderly_poweroff we need to do the following: > > 1) Make sure that orderly_poweroff is called only once. agreed. > 2) Cancel all the scheduled work queues to monitor the temperature as > we have already reached a point of shutting down the system. > agreed. now I think we've found the root cause of the problem. orderly_poweroff() is not reenterable and it does not have to be. If we're using orderly_poweroff() for emergency power off, we have to use it correctly. will you generate a patch to do this? thanks, rui > Let me know your thoughts on this. > > Best Regards, > Keerthy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, there is no clean way of detecting such failure > > > > > > > of > > > > > > > userspace > > > > > > > powering off the system. In such scenarios, it is > > > > > > > necessary for a > > > > > > > backup > > > > > > > workqueue to be able to force a shutdown of the system > > > > > > > when > > > > > > > orderly > > > > > > > shutdown is not successful after a configurable time > > > > > > > period. > > > > > > > > > > > > > Given that system running hot is a thermal issue, I guess > > > > > > we care > > > > > > more > > > > > > on this matter then.. > > > > > Yes! > > > > > > > > > I just read this thread again https://patchwork.kernel.org/patc > > > > h/802458 > > > > 1/ to recall the previous discussion. > > > > > > > > https://patchwork.kernel.org/patch/8149891/ > > > > https://patchwork.kernel.org/patch/8149861/ > > > > should be the solution made based on Ingo' suggestion, right? > > > > > > > > And to me, this sounds like the right direction to go, thermal > > > > does not > > > > need a back up shutdown solution, it just needs a kernel > > > > function call > > > > which guarantees the system can be shutdown/reboot immediately. > > > > > > > > is there any reason that patch 1/2 is not accepted? > > > Zhang, > > > > > > http://www.serverphorums.com/read.php?12,1400964 > > > > > > I got a NAK from Alan and was given this direction on a > > > thermal_poweroff > > > which is more or less what is done in this patch. > > > > > > > Actually, Alan's suggestion is more for you to define a > > thermal_poweroff() that can be defined per architecture. > > > > Also, please, keep track of your patch versions and also do copy > > everybody who has stated their opinion on previous discussions. > > These > > patches must have Ingo, Alan, and RMK copied too. In this way we > > avoid > > loosing track of what has been suggested and we also converge > > faster to > > something everybody (or most of us) agree. Next version, please, > > fix > > that. > > > > > > To me, thermal core needs a function that simply powers off the > > system. > > No timeouts, delayed works, backups, etc. Simple and straight. > > > > The idea of having a per architecture implementation, as per Alan's > > suggestion, makes sense to me too. Having something different from > > pm_power_off(), specific to thermal, might also give the > > opportunity to > > save the power off reason. > > > > BR, > > > > Eduardo Valentin > >
On Wednesday 12 April 2017 01:56 PM, Zhang Rui wrote: > On Wed, 2017-04-12 at 13:25 +0530, Keerthy wrote: >> >> On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote: >>> >>> Keerthy, >>> >>> On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: >>>> >>>> >>>> >>>> On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: >>>>> >>>>> On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: >>>>>> >>>>>> >>>>>> On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin wrote: >>>>>>> >>>>>>> >>>>>>> Hey, >>>>>>> >>>>>>> On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy wrote: >>>>>>>> >>>>>>>> >>>>>>>> off). >>> <cut> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>> OK... This seams to me, still a corner case supposed to be >>>>>>> fixed at >>>>>>> orderly_power_off, not at thermal. But.. >>>>>>> >>> ^^^ Then again, this must be fixed not at thermal core. And re- >>> reading >>> the history of this thread, this seams to be really something >>> broken at >>> OMAP/DRA7, as mentioned in previous messages. That is probably why >>> you >>> are pushing for pm_power_off(), which seams to be the one that >>> works for >>> you, pulling the plug correctly (DRA7). >> Zhang/Eduardo, >> >> OMAP5/DRA7 is one case. >> >> I believe i this is the root cause of this failure. >> >> thermal_zone_device_check --> thermal_zone_device_update --> >> handle_thermal_trip --> handle_critical_trips --> orderly_poweroff >> >> The above sequence happens every 250/500 mS based on the >> configuration. >> The orderly_poweroff function is getting called every 250/500 mS and >> i >> see with a full fledged nfs file system it takes at least 5-10 >> Seconds >> to shutdown and during that time we bombard with orderly_poweroff >> calls >> multiple times due to the thermal_zone_device_check triggering >> periodically. >> >> To confirm that i made sure that handle_critical_trips calls >> orderly_poweroff only once and i no longer see the failure on DRA72- >> EVM >> board. >> > Nice catch! Thanks. > >> So IMHO once we get to handle_critical_trips case where we do >> orderly_poweroff we need to do the following: >> >> 1) Make sure that orderly_poweroff is called only once. > > agreed. > >> 2) Cancel all the scheduled work queues to monitor the temperature as >> we have already reached a point of shutting down the system. >> > agreed. > > now I think we've found the root cause of the problem. > orderly_poweroff() is not reenterable and it does not have to be. > If we're using orderly_poweroff() for emergency power off, we have to > use it correctly. > > will you generate a patch to do this? Sure. I will generate a patch to take care of 1) To make sure that orderly_poweroff is called only once right away. I have already tested. for 2) Cancel all the scheduled work queues to monitor the temperature. I will take some more time to make it and test. Is that okay? Or you want me to send both together? Regards, Keerthy > > thanks, > rui > >> Let me know your thoughts on this. >> >> Best Regards, >> Keerthy >>> >>> >>>> >>>>> >>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> However, there is no clean way of detecting such failure >>>>>>>> of >>>>>>>> userspace >>>>>>>> powering off the system. In such scenarios, it is >>>>>>>> necessary for a >>>>>>>> backup >>>>>>>> workqueue to be able to force a shutdown of the system >>>>>>>> when >>>>>>>> orderly >>>>>>>> shutdown is not successful after a configurable time >>>>>>>> period. >>>>>>>> >>>>>>> Given that system running hot is a thermal issue, I guess >>>>>>> we care >>>>>>> more >>>>>>> on this matter then.. >>>>>> Yes! >>>>>> >>>>> I just read this thread again https://patchwork.kernel.org/patc >>>>> h/802458 >>>>> 1/ to recall the previous discussion. >>>>> >>>>> https://patchwork.kernel.org/patch/8149891/ >>>>> https://patchwork.kernel.org/patch/8149861/ >>>>> should be the solution made based on Ingo' suggestion, right? >>>>> >>>>> And to me, this sounds like the right direction to go, thermal >>>>> does not >>>>> need a back up shutdown solution, it just needs a kernel >>>>> function call >>>>> which guarantees the system can be shutdown/reboot immediately. >>>>> >>>>> is there any reason that patch 1/2 is not accepted? >>>> Zhang, >>>> >>>> http://www.serverphorums.com/read.php?12,1400964 >>>> >>>> I got a NAK from Alan and was given this direction on a >>>> thermal_poweroff >>>> which is more or less what is done in this patch. >>>> >>> >>> Actually, Alan's suggestion is more for you to define a >>> thermal_poweroff() that can be defined per architecture. >>> >>> Also, please, keep track of your patch versions and also do copy >>> everybody who has stated their opinion on previous discussions. >>> These >>> patches must have Ingo, Alan, and RMK copied too. In this way we >>> avoid >>> loosing track of what has been suggested and we also converge >>> faster to >>> something everybody (or most of us) agree. Next version, please, >>> fix >>> that. >>> >>> >>> To me, thermal core needs a function that simply powers off the >>> system. >>> No timeouts, delayed works, backups, etc. Simple and straight. >>> >>> The idea of having a per architecture implementation, as per Alan's >>> suggestion, makes sense to me too. Having something different from >>> pm_power_off(), specific to thermal, might also give the >>> opportunity to >>> save the power off reason. >>> >>> BR, >>> >>> Eduardo Valentin >>>
On Wed, 2017-04-12 at 14:06 +0530, Keerthy wrote: > > On Wednesday 12 April 2017 01:56 PM, Zhang Rui wrote: > > > > On Wed, 2017-04-12 at 13:25 +0530, Keerthy wrote: > > > > > > > > > On Wednesday 12 April 2017 09:35 AM, Eduardo Valentin wrote: > > > > > > > > > > > > Keerthy, > > > > > > > > On Wed, Apr 12, 2017 at 09:09:36AM +0530, Keerthy wrote: > > > > > > > > > > > > > > > > > > > > > > > > > On Wednesday 12 April 2017 08:50 AM, Zhang Rui wrote: > > > > > > > > > > > > > > > > > > On Wed, 2017-04-12 at 08:19 +0530, Keerthy wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Tuesday 11 April 2017 10:59 PM, Eduardo Valentin > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Hey, > > > > > > > > > > > > > > > > On Fri, Mar 31, 2017 at 12:00:20PM +0530, Keerthy > > > > > > > > wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > off). > > > > <cut> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > OK... This seams to me, still a corner case supposed to > > > > > > > > be > > > > > > > > fixed at > > > > > > > > orderly_power_off, not at thermal. But.. > > > > > > > > > > > > ^^^ Then again, this must be fixed not at thermal core. And re- > > > > reading > > > > the history of this thread, this seams to be really something > > > > broken at > > > > OMAP/DRA7, as mentioned in previous messages. That is probably > > > > why > > > > you > > > > are pushing for pm_power_off(), which seams to be the one that > > > > works for > > > > you, pulling the plug correctly (DRA7). > > > Zhang/Eduardo, > > > > > > OMAP5/DRA7 is one case. > > > > > > I believe i this is the root cause of this failure. > > > > > > thermal_zone_device_check --> thermal_zone_device_update --> > > > handle_thermal_trip --> handle_critical_trips --> > > > orderly_poweroff > > > > > > The above sequence happens every 250/500 mS based on the > > > configuration. > > > The orderly_poweroff function is getting called every 250/500 mS > > > and > > > i > > > see with a full fledged nfs file system it takes at least 5-10 > > > Seconds > > > to shutdown and during that time we bombard with orderly_poweroff > > > calls > > > multiple times due to the thermal_zone_device_check triggering > > > periodically. > > > > > > To confirm that i made sure that handle_critical_trips calls > > > orderly_poweroff only once and i no longer see the failure on > > > DRA72- > > > EVM > > > board. > > > > > Nice catch! > Thanks. > > > > > > > > > > > So IMHO once we get to handle_critical_trips case where we do > > > orderly_poweroff we need to do the following: > > > > > > 1) Make sure that orderly_poweroff is called only once. > > agreed. > > > > > > > > 2) Cancel all the scheduled work queues to monitor the > > > temperature as > > > we have already reached a point of shutting down the system. > > > > > agreed. > > > > now I think we've found the root cause of the problem. > > orderly_poweroff() is not reenterable and it does not have to be. > > If we're using orderly_poweroff() for emergency power off, we have > > to > > use it correctly. > > > > will you generate a patch to do this? > Sure. I will generate a patch to take care of 1) To make sure that > orderly_poweroff is called only once right away. I have already > tested. > > for 2) Cancel all the scheduled work queues to monitor the > temperature. > I will take some more time to make it and test. > > Is that okay? Or you want me to send both together? > I think you can send patch for step 1 first. thanks, rui > Regards, > Keerthy > > > > > > > thanks, > > rui > > > > > > > > Let me know your thoughts on this. > > > > > > Best Regards, > > > Keerthy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, there is no clean way of detecting such > > > > > > > > > failure > > > > > > > > > of > > > > > > > > > userspace > > > > > > > > > powering off the system. In such scenarios, it is > > > > > > > > > necessary for a > > > > > > > > > backup > > > > > > > > > workqueue to be able to force a shutdown of the > > > > > > > > > system > > > > > > > > > when > > > > > > > > > orderly > > > > > > > > > shutdown is not successful after a configurable time > > > > > > > > > period. > > > > > > > > > > > > > > > > > Given that system running hot is a thermal issue, I > > > > > > > > guess > > > > > > > > we care > > > > > > > > more > > > > > > > > on this matter then.. > > > > > > > Yes! > > > > > > > > > > > > > I just read this thread again https://patchwork.kernel.org/ > > > > > > patc > > > > > > h/802458 > > > > > > 1/ to recall the previous discussion. > > > > > > > > > > > > https://patchwork.kernel.org/patch/8149891/ > > > > > > https://patchwork.kernel.org/patch/8149861/ > > > > > > should be the solution made based on Ingo' suggestion, > > > > > > right? > > > > > > > > > > > > And to me, this sounds like the right direction to go, > > > > > > thermal > > > > > > does not > > > > > > need a back up shutdown solution, it just needs a kernel > > > > > > function call > > > > > > which guarantees the system can be shutdown/reboot > > > > > > immediately. > > > > > > > > > > > > is there any reason that patch 1/2 is not accepted? > > > > > Zhang, > > > > > > > > > > http://www.serverphorums.com/read.php?12,1400964 > > > > > > > > > > I got a NAK from Alan and was given this direction on a > > > > > thermal_poweroff > > > > > which is more or less what is done in this patch. > > > > > > > > > Actually, Alan's suggestion is more for you to define a > > > > thermal_poweroff() that can be defined per architecture. > > > > > > > > Also, please, keep track of your patch versions and also do > > > > copy > > > > everybody who has stated their opinion on previous discussions. > > > > These > > > > patches must have Ingo, Alan, and RMK copied too. In this way > > > > we > > > > avoid > > > > loosing track of what has been suggested and we also converge > > > > faster to > > > > something everybody (or most of us) agree. Next version, > > > > please, > > > > fix > > > > that. > > > > > > > > > > > > To me, thermal core needs a function that simply powers off the > > > > system. > > > > No timeouts, delayed works, backups, etc. Simple and straight. > > > > > > > > The idea of having a per architecture implementation, as per > > > > Alan's > > > > suggestion, makes sense to me too. Having something different > > > > from > > > > pm_power_off(), specific to thermal, might also give the > > > > opportunity to > > > > save the power off reason. > > > > > > > > BR, > > > > > > > > Eduardo Valentin > > > >
Hello, On Wed, Apr 12, 2017 at 04:45:44PM +0800, Zhang Rui wrote: <cut> > > > > Zhang/Eduardo, > > > > > > > > OMAP5/DRA7 is one case. > > > > > > > > I believe i this is the root cause of this failure. > > > > > > > > thermal_zone_device_check --> thermal_zone_device_update --> > > > > handle_thermal_trip --> handle_critical_trips --> > > > > orderly_poweroff > > > > > > > > The above sequence happens every 250/500 mS based on the > > > > configuration. > > > > The orderly_poweroff function is getting called every 250/500 mS > > > > and > > > > i > > > > see with a full fledged nfs file system it takes at least 5-10 > > > > Seconds > > > > to shutdown and during that time we bombard with orderly_poweroff > > > > calls > > > > multiple times due to the thermal_zone_device_check triggering > > > > periodically. I see. A couple of questions here: a. A regular shutdown command on your setup takes 5 to 10 s? What is the PHY underneath your NFS? 56K modem? b. Or did you mean it takes 5 to 10 s because you keep calling orderly_poweroff? > > > > > > > > To confirm that i made sure that handle_critical_trips calls > > > > orderly_poweroff only once and i no longer see the failure on > > > > DRA72- > > > > EVM > > > > board. > > > > > > > Nice catch! Ok. Nice. But how long does it take? > > Thanks. > > > > > > > > > > > > > > > > So IMHO once we get to handle_critical_trips case where we do > > > > orderly_poweroff we need to do the following: > > > > > > > > 1) Make sure that orderly_poweroff is called only once. > > > agreed. > > > > > > > > > > > 2) Cancel all the scheduled work queues to monitor the > > > > temperature as > > > > we have already reached a point of shutting down the system. > > > > > > > agreed. > > > > > > now I think we've found the root cause of the problem. > > > orderly_poweroff() is not reenterable and it does not have to be. Well, why not? Because we assume that all sources of shutdown within kernel are all gonna happen in different time? What if thermal calls and another subsystem/driver calls it too. Does work if user space also calls shutdown in the middle of a thermal shutdown? I think we need to think this through a bit more.. > > > If we're using orderly_poweroff() for emergency power off, we have > > > to > > > use it correctly. > > > I agree. But there it nothing that says it is not reenterable. If you saw something in this line, can you please share? > > > will you generate a patch to do this? > > Sure. I will generate a patch to take care of 1) To make sure that > > orderly_poweroff is called only once right away. I have already > > tested. > > > > for 2) Cancel all the scheduled work queues to monitor the > > temperature. > > I will take some more time to make it and test. > > > > Is that okay? Or you want me to send both together? > > > I think you can send patch for step 1 first. I am happy to see that Keerthy found the problem with his setup and a possible solution. But I have a few concerns here. 1. If regular shutdown process takes 10seconds, that is a ballpark that thermal should never wait. orderly_poweroff() calls run_cmd() with wait flag set. That means, if regular userland shutdown takes 10s, we are waiting for it. Obviously this not acceptable. Specially if you setup critical trip to be 125C. Now, if you properly size the critical trip to fire before hotspot really reach 125C, for 10s (or the time it takes to shutdown), then fine. But based on what was described in this thread, his system is waiting 10s on regular shutdown, and his silicon is on out-of-spec temperature for 10s, which is wrong. 2. The above scenario is not acceptable in a long run, specially from a reliability perspective. If orderly_poweroff() has a possibility to simply never return (or take too long), I would say the thermal subsystem is using the wrong API. If you are going to implement the above two patches, keep in mind: i. At least within the thermal subsystem, you need to take care of all zones that could trigger a shutdown. ii. serializing the calls to orderly_poweroff() seams to be more concerning than cancelling all monitoring. BR, Eduardo Valentin > > thanks, > rui > > Regards, > > Keerthy > > > > > > > > > > > thanks, > > > rui > > > > > > > > > > > Let me know your thoughts on this. > > > > > > > > Best Regards, > > > > Keerthy > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > However, there is no clean way of detecting such > > > > > > > > > > failure > > > > > > > > > > of > > > > > > > > > > userspace > > > > > > > > > > powering off the system. In such scenarios, it is > > > > > > > > > > necessary for a > > > > > > > > > > backup > > > > > > > > > > workqueue to be able to force a shutdown of the > > > > > > > > > > system > > > > > > > > > > when > > > > > > > > > > orderly > > > > > > > > > > shutdown is not successful after a configurable time > > > > > > > > > > period. > > > > > > > > > > > > > > > > > > > Given that system running hot is a thermal issue, I > > > > > > > > > guess > > > > > > > > > we care > > > > > > > > > more > > > > > > > > > on this matter then.. > > > > > > > > Yes! > > > > > > > > > > > > > > > I just read this thread again https://patchwork.kernel.org/ > > > > > > > patc > > > > > > > h/802458 > > > > > > > 1/ to recall the previous discussion. > > > > > > > > > > > > > > https://patchwork.kernel.org/patch/8149891/ > > > > > > > https://patchwork.kernel.org/patch/8149861/ > > > > > > > should be the solution made based on Ingo' suggestion, > > > > > > > right? > > > > > > > > > > > > > > And to me, this sounds like the right direction to go, > > > > > > > thermal > > > > > > > does not > > > > > > > need a back up shutdown solution, it just needs a kernel > > > > > > > function call > > > > > > > which guarantees the system can be shutdown/reboot > > > > > > > immediately. > > > > > > > > > > > > > > is there any reason that patch 1/2 is not accepted? > > > > > > Zhang, > > > > > > > > > > > > http://www.serverphorums.com/read.php?12,1400964 > > > > > > > > > > > > I got a NAK from Alan and was given this direction on a > > > > > > thermal_poweroff > > > > > > which is more or less what is done in this patch. > > > > > > > > > > > Actually, Alan's suggestion is more for you to define a > > > > > thermal_poweroff() that can be defined per architecture. > > > > > > > > > > Also, please, keep track of your patch versions and also do > > > > > copy > > > > > everybody who has stated their opinion on previous discussions. > > > > > These > > > > > patches must have Ingo, Alan, and RMK copied too. In this way > > > > > we > > > > > avoid > > > > > loosing track of what has been suggested and we also converge > > > > > faster to > > > > > something everybody (or most of us) agree. Next version, > > > > > please, > > > > > fix > > > > > that. > > > > > > > > > > > > > > > To me, thermal core needs a function that simply powers off the > > > > > system. > > > > > No timeouts, delayed works, backups, etc. Simple and straight. > > > > > > > > > > The idea of having a per architecture implementation, as per > > > > > Alan's > > > > > suggestion, makes sense to me too. Having something different > > > > > from > > > > > pm_power_off(), specific to thermal, might also give the > > > > > opportunity to > > > > > save the power off reason. > > > > > > > > > > BR, > > > > > > > > > > Eduardo Valentin > > > > >
On Wednesday 12 April 2017 09:14 PM, Eduardo Valentin wrote: > Hello, > > On Wed, Apr 12, 2017 at 04:45:44PM +0800, Zhang Rui wrote: > > <cut> > >>>>> Zhang/Eduardo, >>>>> >>>>> OMAP5/DRA7 is one case. >>>>> >>>>> I believe i this is the root cause of this failure. >>>>> >>>>> thermal_zone_device_check --> thermal_zone_device_update --> >>>>> handle_thermal_trip --> handle_critical_trips --> >>>>> orderly_poweroff >>>>> >>>>> The above sequence happens every 250/500 mS based on the >>>>> configuration. >>>>> The orderly_poweroff function is getting called every 250/500 mS >>>>> and >>>>> i >>>>> see with a full fledged nfs file system it takes at least 5-10 >>>>> Seconds >>>>> to shutdown and during that time we bombard with orderly_poweroff >>>>> calls >>>>> multiple times due to the thermal_zone_device_check triggering >>>>> periodically. > > I see. A couple of questions here: > a. A regular shutdown command on your setup takes 5 to 10 s? What is the > PHY underneath your NFS? 56K modem? Its not 56K modem but also i am not running on busybox! Its a full fledged arago file system. Yes i have run a basic poweroff and it takes about 5S. I will share the logs with timings the first thing tomorrow. > b. Or did you mean it takes 5 to 10 s because you keep calling > orderly_poweroff? If we keep calling orderly_poweroff it would never shutdown. Hence the issue. > >>>>> >>>>> To confirm that i made sure that handle_critical_trips calls >>>>> orderly_poweroff only once and i no longer see the failure on >>>>> DRA72- >>>>> EVM >>>>> board. >>>>> > > >>>> Nice catch! > > Ok. Nice. But how long does it take? About 5-10S as i mentioned. First and foremost there is an issue here where in we keep calling orderly_poweroff which needs to be addressed. > >>> Thanks. >>> >>>> >>>> >>>>> >>>>> So IMHO once we get to handle_critical_trips case where we do >>>>> orderly_poweroff we need to do the following: >>>>> >>>>> 1) Make sure that orderly_poweroff is called only once. >>>> agreed. >>>> >>>>> >>>>> 2) Cancel all the scheduled work queues to monitor the >>>>> temperature as >>>>> we have already reached a point of shutting down the system. >>>>> >>>> agreed. >>>> >>>> now I think we've found the root cause of the problem. >>>> orderly_poweroff() is not reenterable and it does not have to be. > > > Well, why not? Because we assume that all sources of shutdown within > kernel are all gonna happen in different time? What if thermal calls and > another subsystem/driver calls it too. Does work if user space also > calls shutdown in the middle of a thermal shutdown? I think we need to > think this through a bit more.. Definitely we need to think a lot more but point agreed. Why is thermal framework calling orderly_poweroff multiple times? Say even if you manage to shut off in 2 seconds you still end up calling 4 to 8 times depending on 500mS or 250mS delay. > >>>> If we're using orderly_poweroff() for emergency power off, we have >>>> to >>>> use it correctly. >>>> > > I agree. But there it nothing that says it is not reenterable. If you > saw something in this line, can you please share? > >>>> will you generate a patch to do this? >>> Sure. I will generate a patch to take care of 1) To make sure that >>> orderly_poweroff is called only once right away. I have already >>> tested. >>> >>> for 2) Cancel all the scheduled work queues to monitor the >>> temperature. >>> I will take some more time to make it and test. >>> >>> Is that okay? Or you want me to send both together? >>> >> I think you can send patch for step 1 first. > > I am happy to see that Keerthy found the problem with his setup and a > possible solution. But I have a few concerns here. > > 1. If regular shutdown process takes 10seconds, that is a ballpark that > thermal should never wait. orderly_poweroff() calls run_cmd() with wait > flag set. That means, if regular userland shutdown takes 10s, we are > waiting for it. Obviously this not acceptable. Specially if you setup > critical trip to be 125C. Now, if you properly size the critical trip to > fire before hotspot really reach 125C, for 10s (or the time it takes to > shutdown), then fine. But based on what was described in this thread, > his system is waiting 10s on regular shutdown, and his silicon is on > out-of-spec temperature for 10s, which is wrong. 2 approaches can be taken here: 1) Reduce the critical temperature to something lesser than the hardware critical point. Or 2) Call kernel_power_off directly as you are in a pretty critical situation! That only takes less than a second and powers off the PMIC at least on OMAP5/DRA7. > > 2. The above scenario is not acceptable in a long run, specially from a > reliability perspective. If orderly_poweroff() has a possibility to > simply never return (or take too long), I would say the thermal > subsystem is using the wrong API. As mentioned above kernel_power_off? > > > If you are going to implement the above two patches, keep in mind: > i. At least within the thermal subsystem, you need to take care of all > zones that could trigger a shutdown. Do you think it makes sense for all the 'n' sensors to trigger orderly_poweroff one by one? Or we should worry about the first source and ensure that it shuts off the system? Is it not enough to catch the first critical alert and poweroff > ii. serializing the calls to orderly_poweroff() seams to be more > concerning than cancelling all monitoring. > > > BR, > > Eduardo Valentin >> >> thanks, >> rui >>> Regards, >>> Keerthy >>> >>>> >>>> >>>> thanks, >>>> rui >>>> >>>>> >>>>> Let me know your thoughts on this. >>>>> >>>>> Best Regards, >>>>> Keerthy >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> However, there is no clean way of detecting such >>>>>>>>>>> failure >>>>>>>>>>> of >>>>>>>>>>> userspace >>>>>>>>>>> powering off the system. In such scenarios, it is >>>>>>>>>>> necessary for a >>>>>>>>>>> backup >>>>>>>>>>> workqueue to be able to force a shutdown of the >>>>>>>>>>> system >>>>>>>>>>> when >>>>>>>>>>> orderly >>>>>>>>>>> shutdown is not successful after a configurable time >>>>>>>>>>> period. >>>>>>>>>>> >>>>>>>>>> Given that system running hot is a thermal issue, I >>>>>>>>>> guess >>>>>>>>>> we care >>>>>>>>>> more >>>>>>>>>> on this matter then.. >>>>>>>>> Yes! >>>>>>>>> >>>>>>>> I just read this thread again https://patchwork.kernel.org/ >>>>>>>> patc >>>>>>>> h/802458 >>>>>>>> 1/ to recall the previous discussion. >>>>>>>> >>>>>>>> https://patchwork.kernel.org/patch/8149891/ >>>>>>>> https://patchwork.kernel.org/patch/8149861/ >>>>>>>> should be the solution made based on Ingo' suggestion, >>>>>>>> right? >>>>>>>> >>>>>>>> And to me, this sounds like the right direction to go, >>>>>>>> thermal >>>>>>>> does not >>>>>>>> need a back up shutdown solution, it just needs a kernel >>>>>>>> function call >>>>>>>> which guarantees the system can be shutdown/reboot >>>>>>>> immediately. >>>>>>>> >>>>>>>> is there any reason that patch 1/2 is not accepted? >>>>>>> Zhang, >>>>>>> >>>>>>> http://www.serverphorums.com/read.php?12,1400964 >>>>>>> >>>>>>> I got a NAK from Alan and was given this direction on a >>>>>>> thermal_poweroff >>>>>>> which is more or less what is done in this patch. >>>>>>> >>>>>> Actually, Alan's suggestion is more for you to define a >>>>>> thermal_poweroff() that can be defined per architecture. >>>>>> >>>>>> Also, please, keep track of your patch versions and also do >>>>>> copy >>>>>> everybody who has stated their opinion on previous discussions. >>>>>> These >>>>>> patches must have Ingo, Alan, and RMK copied too. In this way >>>>>> we >>>>>> avoid >>>>>> loosing track of what has been suggested and we also converge >>>>>> faster to >>>>>> something everybody (or most of us) agree. Next version, >>>>>> please, >>>>>> fix >>>>>> that. >>>>>> >>>>>> >>>>>> To me, thermal core needs a function that simply powers off the >>>>>> system. >>>>>> No timeouts, delayed works, backups, etc. Simple and straight. >>>>>> >>>>>> The idea of having a per architecture implementation, as per >>>>>> Alan's >>>>>> suggestion, makes sense to me too. Having something different >>>>>> from >>>>>> pm_power_off(), specific to thermal, might also give the >>>>>> opportunity to >>>>>> save the power off reason. >>>>>> >>>>>> BR, >>>>>> >>>>>> Eduardo Valentin >>>>>>
On 04/12/2017 10:44 AM, Eduardo Valentin wrote: > Hello, > ... > > I agree. But there it nothing that says it is not reenterable. If you > saw something in this line, can you please share? > >>>> will you generate a patch to do this? >>> Sure. I will generate a patch to take care of 1) To make sure that >>> orderly_poweroff is called only once right away. I have already >>> tested. >>> >>> for 2) Cancel all the scheduled work queues to monitor the >>> temperature. >>> I will take some more time to make it and test. >>> >>> Is that okay? Or you want me to send both together? >>> >> I think you can send patch for step 1 first. > > I am happy to see that Keerthy found the problem with his setup and a > possible solution. But I have a few concerns here. > > 1. If regular shutdown process takes 10seconds, that is a ballpark that > thermal should never wait. orderly_poweroff() calls run_cmd() with wait > flag set. That means, if regular userland shutdown takes 10s, we are > waiting for it. Obviously this not acceptable. Specially if you setup > critical trip to be 125C. Now, if you properly size the critical trip to > fire before hotspot really reach 125C, for 10s (or the time it takes to > shutdown), then fine. But based on what was described in this thread, > his system is waiting 10s on regular shutdown, and his silicon is on > out-of-spec temperature for 10s, which is wrong. > > 2. The above scenario is not acceptable in a long run, specially from a > reliability perspective. If orderly_poweroff() has a possibility to > simply never return (or take too long), I would say the thermal > subsystem is using the wrong API. > Hh, I do not see that orderly_poweroff() will wait for anything now: void orderly_poweroff(bool force) { if (force) /* do not override the pending "true" */ poweroff_force = true; schedule_work(&poweroff_work); ^^^^^^^ async call. even here can be pretty big delay if system is under pressure } static int __orderly_poweroff(bool force) { int ret; ret = run_cmd(poweroff_cmd); ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC if (ret && force) { pr_warn("Failed to start orderly shutdown: forcing the issue\n"); /* * I guess this should try to kick off some daemon to sync and * poweroff asap. Or not even bother syncing if we're doing an * emergency shutdown? */ emergency_sync(); kernel_power_off(); ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist } return ret; } static bool poweroff_force; static void poweroff_work_func(struct work_struct *work) { __orderly_poweroff(poweroff_force); } As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result of US poweroff binary execution. > > If you are going to implement the above two patches, keep in mind: > i. At least within the thermal subsystem, you need to take care of all > zones that could trigger a shutdown. > ii. serializing the calls to orderly_poweroff() seams to be more > concerning than cancelling all monitoring. > >
Hey, On Wed, Apr 12, 2017 at 11:31:18AM -0500, Grygorii Strashko wrote: > > > On 04/12/2017 10:44 AM, Eduardo Valentin wrote: > > Hello, > > > ... > > > > > I agree. But there it nothing that says it is not reenterable. If you > > saw something in this line, can you please share? > > > >>>> will you generate a patch to do this? > >>> Sure. I will generate a patch to take care of 1) To make sure that > >>> orderly_poweroff is called only once right away. I have already > >>> tested. > >>> > >>> for 2) Cancel all the scheduled work queues to monitor the > >>> temperature. > >>> I will take some more time to make it and test. > >>> > >>> Is that okay? Or you want me to send both together? > >>> > >> I think you can send patch for step 1 first. > > > > I am happy to see that Keerthy found the problem with his setup and a > > possible solution. But I have a few concerns here. > > > > 1. If regular shutdown process takes 10seconds, that is a ballpark that > > thermal should never wait. orderly_poweroff() calls run_cmd() with wait > > flag set. That means, if regular userland shutdown takes 10s, we are > > waiting for it. Obviously this not acceptable. Specially if you setup > > critical trip to be 125C. Now, if you properly size the critical trip to > > fire before hotspot really reach 125C, for 10s (or the time it takes to > > shutdown), then fine. But based on what was described in this thread, > > his system is waiting 10s on regular shutdown, and his silicon is on > > out-of-spec temperature for 10s, which is wrong. > > > > 2. The above scenario is not acceptable in a long run, specially from a > > reliability perspective. If orderly_poweroff() has a possibility to > > simply never return (or take too long), I would say the thermal > > subsystem is using the wrong API. > > > > > Hh, I do not see that orderly_poweroff() will wait for anything now: > void orderly_poweroff(bool force) > { > if (force) /* do not override the pending "true" */ > poweroff_force = true; > schedule_work(&poweroff_work); > ^^^^^^^ async call. even here can be pretty big delay if system is under pressure > } > > > static int __orderly_poweroff(bool force) > { > int ret; > > ret = run_cmd(poweroff_cmd); > ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC Yeah, and that is what I really meant. Sorry for the confusion. The exec is problematic in his scenario too, given he is running on a very interesting NFS setup. Yes, the WAIT_EXEC is set: 392 static int run_cmd(const char *cmd) 393 { 394 char **argv; 395 static char *envp[] = { 396 "HOME=/", 397 "PATH=/sbin:/bin:/usr/sbin:/usr/bin", 398 NULL 399 }; 400 int ret; 401 argv = argv_split(GFP_KERNEL, cmd, NULL); 402 if (argv) { 403 ret = call_usermodehelper(argv[0], argv, envp, UMH_WAIT_EXEC); 404 argv_free(argv); 405 } else { 406 ret = -ENOMEM; 407 } 408 409 return ret; 410 } 411 > > if (ret && force) { > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > /* > * I guess this should try to kick off some daemon to sync and > * poweroff asap. Or not even bother syncing if we're doing an > * emergency shutdown? > */ > emergency_sync(); > kernel_power_off(); > ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist > } > > return ret; > } > > static bool poweroff_force; > > static void poweroff_work_func(struct work_struct *work) > { > __orderly_poweroff(poweroff_force); > } > > As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result > of US poweroff binary execution. > > > > > If you are going to implement the above two patches, keep in mind: > > i. At least within the thermal subsystem, you need to take care of all > > zones that could trigger a shutdown. > > ii. serializing the calls to orderly_poweroff() seams to be more > > concerning than cancelling all monitoring. > > > > > > -- > regards, > -grygorii
On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: > > > On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >> Hello, >> > ... > >> >> I agree. But there it nothing that says it is not reenterable. If you >> saw something in this line, can you please share? >> >>>>> will you generate a patch to do this? >>>> Sure. I will generate a patch to take care of 1) To make sure that >>>> orderly_poweroff is called only once right away. I have already >>>> tested. >>>> >>>> for 2) Cancel all the scheduled work queues to monitor the >>>> temperature. >>>> I will take some more time to make it and test. >>>> >>>> Is that okay? Or you want me to send both together? >>>> >>> I think you can send patch for step 1 first. >> >> I am happy to see that Keerthy found the problem with his setup and a >> possible solution. But I have a few concerns here. >> >> 1. If regular shutdown process takes 10seconds, that is a ballpark that >> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >> flag set. That means, if regular userland shutdown takes 10s, we are >> waiting for it. Obviously this not acceptable. Specially if you setup >> critical trip to be 125C. Now, if you properly size the critical trip to >> fire before hotspot really reach 125C, for 10s (or the time it takes to >> shutdown), then fine. But based on what was described in this thread, >> his system is waiting 10s on regular shutdown, and his silicon is on >> out-of-spec temperature for 10s, which is wrong. >> >> 2. The above scenario is not acceptable in a long run, specially from a >> reliability perspective. If orderly_poweroff() has a possibility to >> simply never return (or take too long), I would say the thermal >> subsystem is using the wrong API. >> > > > Hh, I do not see that orderly_poweroff() will wait for anything now: > void orderly_poweroff(bool force) > { > if (force) /* do not override the pending "true" */ > poweroff_force = true; > schedule_work(&poweroff_work); > ^^^^^^^ async call. even here can be pretty big delay if system is under pressure > } > > > static int __orderly_poweroff(bool force) > { > int ret; > > ret = run_cmd(poweroff_cmd); When i tried with multiple orderly_poweroff calls ret was always 0. So every 250mS i see this ret = 0. > ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC > > if (ret && force) { So it never entered this path. ret = 0 so if is not executed. > pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > > /* > * I guess this should try to kick off some daemon to sync and > * poweroff asap. Or not even bother syncing if we're doing an > * emergency shutdown? > */ > emergency_sync(); > kernel_power_off(); > ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist > } > > return ret; > } > > static bool poweroff_force; > > static void poweroff_work_func(struct work_struct *work) > { > __orderly_poweroff(poweroff_force); > } > > As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result > of US poweroff binary execution. > >> >> If you are going to implement the above two patches, keep in mind: >> i. At least within the thermal subsystem, you need to take care of all >> zones that could trigger a shutdown. >> ii. serializing the calls to orderly_poweroff() seams to be more >> concerning than cancelling all monitoring. >> >> >
Hey On Wed, Apr 12, 2017 at 09:46:47PM +0530, Keerthy wrote: > > > On Wednesday 12 April 2017 09:14 PM, Eduardo Valentin wrote: > > Hello, > > > > On Wed, Apr 12, 2017 at 04:45:44PM +0800, Zhang Rui wrote: > > > > <cut> > > > >>>>> Zhang/Eduardo, > >>>>> > >>>>> OMAP5/DRA7 is one case. > >>>>> > >>>>> I believe i this is the root cause of this failure. > >>>>> > >>>>> thermal_zone_device_check --> thermal_zone_device_update --> > >>>>> handle_thermal_trip --> handle_critical_trips --> > >>>>> orderly_poweroff > >>>>> > >>>>> The above sequence happens every 250/500 mS based on the > >>>>> configuration. > >>>>> The orderly_poweroff function is getting called every 250/500 mS > >>>>> and > >>>>> i > >>>>> see with a full fledged nfs file system it takes at least 5-10 > >>>>> Seconds > >>>>> to shutdown and during that time we bombard with orderly_poweroff > >>>>> calls > >>>>> multiple times due to the thermal_zone_device_check triggering > >>>>> periodically. > > > > I see. A couple of questions here: > > a. A regular shutdown command on your setup takes 5 to 10 s? What is the > > PHY underneath your NFS? 56K modem? > > Its not 56K modem but also i am not running on busybox! OK. :-) > Its a full fledged arago file system. Yes i have run a basic poweroff > and it takes about 5S. I will share the logs with timings the first > thing tomorrow. > I see. > > b. Or did you mean it takes 5 to 10 s because you keep calling > > orderly_poweroff? > > If we keep calling orderly_poweroff it would never shutdown. Hence the > issue. Yeah, if you could share the logs would be great to understand where the wait sits. > > > > >>>>> > >>>>> To confirm that i made sure that handle_critical_trips calls > >>>>> orderly_poweroff only once and i no longer see the failure on > >>>>> DRA72- > >>>>> EVM > >>>>> board. > >>>>> > > > > > >>>> Nice catch! > > > > Ok. Nice. But how long does it take? > > About 5-10S as i mentioned. > > First and foremost there is an issue here where in we keep calling > orderly_poweroff which needs to be addressed. > I agree here. Apparently, the expectations of the API were wrong. I agree on refraining from calling it multiple times before it finishes. But, I said this before, and I will repeat myself. I believe thermal is not the only user of this API, maybe the problem is more apparent for thermal because we call it multiple times, and we want it to finishes, but even after fixing the serialization on thermal side, we can still collide with other parts of the kernel and userland. > > > >>> Thanks. > >>> > >>>> > >>>> > >>>>> > >>>>> So IMHO once we get to handle_critical_trips case where we do > >>>>> orderly_poweroff we need to do the following: > >>>>> > >>>>> 1) Make sure that orderly_poweroff is called only once. > >>>> agreed. > >>>> > >>>>> > >>>>> 2) Cancel all the scheduled work queues to monitor the > >>>>> temperature as > >>>>> we have already reached a point of shutting down the system. > >>>>> > >>>> agreed. > >>>> > >>>> now I think we've found the root cause of the problem. > >>>> orderly_poweroff() is not reenterable and it does not have to be. > > > > > > Well, why not? Because we assume that all sources of shutdown within > > kernel are all gonna happen in different time? What if thermal calls and > > another subsystem/driver calls it too. Does work if user space also > > calls shutdown in the middle of a thermal shutdown? I think we need to > > think this through a bit more.. > > Definitely we need to think a lot more but point agreed. Why is thermal > framework calling orderly_poweroff multiple times? Say even if you > manage to shut off in 2 seconds you still end up calling 4 to 8 times > depending on 500mS or 250mS delay. I agree here. Also, a graceful thermal shutdown may also mean displaying a message, etc. In this case, you have to size properly the trip, accounting shutdown down time, and your reliability expectation. > > > > >>>> If we're using orderly_poweroff() for emergency power off, we have > >>>> to > >>>> use it correctly. > >>>> > > > > I agree. But there it nothing that says it is not reenterable. If you > > saw something in this line, can you please share? > > > >>>> will you generate a patch to do this? > >>> Sure. I will generate a patch to take care of 1) To make sure that > >>> orderly_poweroff is called only once right away. I have already > >>> tested. > >>> > >>> for 2) Cancel all the scheduled work queues to monitor the > >>> temperature. > >>> I will take some more time to make it and test. > >>> > >>> Is that okay? Or you want me to send both together? > >>> > >> I think you can send patch for step 1 first. > > > > I am happy to see that Keerthy found the problem with his setup and a > > possible solution. But I have a few concerns here. > > > > 1. If regular shutdown process takes 10seconds, that is a ballpark that > > thermal should never wait. orderly_poweroff() calls run_cmd() with wait > > flag set. That means, if regular userland shutdown takes 10s, we are > > waiting for it. Obviously this not acceptable. Specially if you setup > > critical trip to be 125C. Now, if you properly size the critical trip to > > fire before hotspot really reach 125C, for 10s (or the time it takes to > > shutdown), then fine. But based on what was described in this thread, > > his system is waiting 10s on regular shutdown, and his silicon is on > > out-of-spec temperature for 10s, which is wrong. > > 2 approaches can be taken here: > > 1) Reduce the critical temperature to something lesser than the hardware > critical point. > > Or > > 2) Call kernel_power_off directly as you are in a pretty critical > situation! That only takes less than a second and powers off the PMIC at > least on OMAP5/DRA7. I think the code needs to allow doing both, actually. Considering both, the silicon and system reliability, and userland (and end user) interaction, the thermal shutdown typically needs to: 1. Make sure it avoids reliability problems, i.e., one shall not allow device to run on out-of-spec temperature. 2. Give the opportunity for the system to gracefully shutdown, so you have the time to keep system state sane (save your data, notify user, etc), even if you are on a 56K modem :-) > > > > > 2. The above scenario is not acceptable in a long run, specially from a > > reliability perspective. If orderly_poweroff() has a possibility to > > simply never return (or take too long), I would say the thermal > > subsystem is using the wrong API. > > As mentioned above kernel_power_off? > > > > > > > If you are going to implement the above two patches, keep in mind: > > i. At least within the thermal subsystem, you need to take care of all > > zones that could trigger a shutdown. > > Do you think it makes sense for all the 'n' sensors to trigger > orderly_poweroff one by one? Or we should worry about the first source > and ensure that it shuts off the system? > > Is it not enough to catch the first critical alert and poweroff I think it is enough if we make sure the first one goes through properly. For accountability purposes, some people would like to also know if other sensors are too hot too, and could be also firing the shutdown. Only making sure that the first shutdown goes all the way through, and block any other thermal shutdowns, it is enough. Then again, I do not think you need to cancel all the monitoring in the system. Given the above points, my suggestion is to: 1. still call orderly_poweroff(), therefore, you still give the opportunity for userland to gracefully power off. 2. but still make sure, once one of the zones hits critical, no other will call orderly_poweroff() 3. Also, when in the critical path, make sure there is no way back, or long delays, allowing system engineer to size the shutdown wait. Shutdown wait is a system property, not a zone property. That is, we eventually call kernel_power_off(). All in all, 1. and 2. above are part of what you found and what has been proposed to make sure we call orderly_poweroff() only once, system wide (or at least thermal subsystem wide). And 3. is pretty much the proposed patch in this series, I think this still needs to go, and I am convinced that thermal core is best place to write the backup mechanism, given the expected variability of orderly_poweroff(). BR, Eduardo Valentin
Keerthy, On Wed, Apr 12, 2017 at 10:14:36PM +0530, Keerthy wrote: > > > On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: > > > > > > On 04/12/2017 10:44 AM, Eduardo Valentin wrote: > >> Hello, > >> > > ... > > > >> > >> I agree. But there it nothing that says it is not reenterable. If you > >> saw something in this line, can you please share? > >> > >>>>> will you generate a patch to do this? > >>>> Sure. I will generate a patch to take care of 1) To make sure that > >>>> orderly_poweroff is called only once right away. I have already > >>>> tested. > >>>> > >>>> for 2) Cancel all the scheduled work queues to monitor the > >>>> temperature. > >>>> I will take some more time to make it and test. > >>>> > >>>> Is that okay? Or you want me to send both together? > >>>> > >>> I think you can send patch for step 1 first. > >> > >> I am happy to see that Keerthy found the problem with his setup and a > >> possible solution. But I have a few concerns here. > >> > >> 1. If regular shutdown process takes 10seconds, that is a ballpark that > >> thermal should never wait. orderly_poweroff() calls run_cmd() with wait > >> flag set. That means, if regular userland shutdown takes 10s, we are > >> waiting for it. Obviously this not acceptable. Specially if you setup > >> critical trip to be 125C. Now, if you properly size the critical trip to > >> fire before hotspot really reach 125C, for 10s (or the time it takes to > >> shutdown), then fine. But based on what was described in this thread, > >> his system is waiting 10s on regular shutdown, and his silicon is on > >> out-of-spec temperature for 10s, which is wrong. > >> > >> 2. The above scenario is not acceptable in a long run, specially from a > >> reliability perspective. If orderly_poweroff() has a possibility to > >> simply never return (or take too long), I would say the thermal > >> subsystem is using the wrong API. > >> > > > > > > Hh, I do not see that orderly_poweroff() will wait for anything now: > > void orderly_poweroff(bool force) > > { > > if (force) /* do not override the pending "true" */ > > poweroff_force = true; > > schedule_work(&poweroff_work); > > ^^^^^^^ async call. even here can be pretty big delay if system is under pressure > > } > > > > > > static int __orderly_poweroff(bool force) > > { > > int ret; > > > > ret = run_cmd(poweroff_cmd); > > When i tried with multiple orderly_poweroff calls ret was always 0. > So every 250mS i see this ret = 0. > > > ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC > > > > if (ret && force) { > > So it never entered this path. ret = 0 so if is not executed. I think your setup has two major problems then: 1. when kernel runs userspace power off, it execs properly, in fact, it is not triggered. 2. when you finally exec it, it takes 5s to finish. If this is correct, I think my suggestions on the other email still holds. BR,
On Wednesday 12 April 2017 10:24 PM, Eduardo Valentin wrote: > Keerthy, > > On Wed, Apr 12, 2017 at 10:14:36PM +0530, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>> >>> >>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>> Hello, >>>> >>> ... >>> >>>> >>>> I agree. But there it nothing that says it is not reenterable. If you >>>> saw something in this line, can you please share? >>>> >>>>>>> will you generate a patch to do this? >>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>> orderly_poweroff is called only once right away. I have already >>>>>> tested. >>>>>> >>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>> temperature. >>>>>> I will take some more time to make it and test. >>>>>> >>>>>> Is that okay? Or you want me to send both together? >>>>>> >>>>> I think you can send patch for step 1 first. >>>> >>>> I am happy to see that Keerthy found the problem with his setup and a >>>> possible solution. But I have a few concerns here. >>>> >>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that >>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>> waiting for it. Obviously this not acceptable. Specially if you setup >>>> critical trip to be 125C. Now, if you properly size the critical trip to >>>> fire before hotspot really reach 125C, for 10s (or the time it takes to >>>> shutdown), then fine. But based on what was described in this thread, >>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>> out-of-spec temperature for 10s, which is wrong. >>>> >>>> 2. The above scenario is not acceptable in a long run, specially from a >>>> reliability perspective. If orderly_poweroff() has a possibility to >>>> simply never return (or take too long), I would say the thermal >>>> subsystem is using the wrong API. >>>> >>> >>> >>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>> void orderly_poweroff(bool force) >>> { >>> if (force) /* do not override the pending "true" */ >>> poweroff_force = true; >>> schedule_work(&poweroff_work); >>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure >>> } >>> >>> >>> static int __orderly_poweroff(bool force) >>> { >>> int ret; >>> >>> ret = run_cmd(poweroff_cmd); >> >> When i tried with multiple orderly_poweroff calls ret was always 0. >> So every 250mS i see this ret = 0. >> >>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>> >>> if (ret && force) { >> >> So it never entered this path. ret = 0 so if is not executed. > > I think your setup has two major problems then: > 1. when kernel runs userspace power off, it execs properly, in fact, it > is not triggered. It does work neatly when orderly_poweroff is called once. It gracefully shuts down the system. I see problem is when we call run_cmd every 250mS multiple times. > 2. when you finally exec it, it takes 5s to finish. I will share the logs. > > If this is correct, I think my suggestions on the other email > still holds. > > BR, >
On 04/12/2017 11:44 AM, Keerthy wrote: > > > On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >> >> >> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>> Hello, >>> >> ... >> >>> >>> I agree. But there it nothing that says it is not reenterable. If you >>> saw something in this line, can you please share? >>> >>>>>> will you generate a patch to do this? >>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>> orderly_poweroff is called only once right away. I have already >>>>> tested. >>>>> >>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>> temperature. >>>>> I will take some more time to make it and test. >>>>> >>>>> Is that okay? Or you want me to send both together? >>>>> >>>> I think you can send patch for step 1 first. >>> >>> I am happy to see that Keerthy found the problem with his setup and a >>> possible solution. But I have a few concerns here. >>> >>> 1. If regular shutdown process takes 10seconds, that is a ballpark that >>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >>> flag set. That means, if regular userland shutdown takes 10s, we are >>> waiting for it. Obviously this not acceptable. Specially if you setup >>> critical trip to be 125C. Now, if you properly size the critical trip to >>> fire before hotspot really reach 125C, for 10s (or the time it takes to >>> shutdown), then fine. But based on what was described in this thread, >>> his system is waiting 10s on regular shutdown, and his silicon is on >>> out-of-spec temperature for 10s, which is wrong. >>> >>> 2. The above scenario is not acceptable in a long run, specially from a >>> reliability perspective. If orderly_poweroff() has a possibility to >>> simply never return (or take too long), I would say the thermal >>> subsystem is using the wrong API. ^ this question just repeat everything which was already discussed in previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff, but what to use instead? >>> >> >> >> Hh, I do not see that orderly_poweroff() will wait for anything now: >> void orderly_poweroff(bool force) >> { >> if (force) /* do not override the pending "true" */ >> poweroff_force = true; >> schedule_work(&poweroff_work); >> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure >> } >> >> >> static int __orderly_poweroff(bool force) >> { >> int ret; >> >> ret = run_cmd(poweroff_cmd); > > When i tried with multiple orderly_poweroff calls ret was always 0. > So every 250mS i see this ret = 0. > >> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >> >> if (ret && force) { > > So it never entered this path. ret = 0 so if is not executed. correct, because exec can find poweroff tool and start it, so you, most probably, have bunch of this tool instance running in parallel (some of them can fail or block) Issue 1 - you've sent fix for is actual :). Again, thermal has no control of power off process once run_cmd() is returned, and it do not know what US poweroff binary is doing and how much time can it take (which include disks maintenance - loooong delay). > >> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); >> >> /* >> * I guess this should try to kick off some daemon to sync and >> * poweroff asap. Or not even bother syncing if we're doing an >> * emergency shutdown? >> */ >> emergency_sync(); >> kernel_power_off(); >> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist >> } >> >> return ret; >> } >> >> static bool poweroff_force; >> >> static void poweroff_work_func(struct work_struct *work) >> { >> __orderly_poweroff(poweroff_force); >> } >> >> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result >> of US poweroff binary execution. >> >>> >>> If you are going to implement the above two patches, keep in mind: >>> i. At least within the thermal subsystem, you need to take care of all >>> zones that could trigger a shutdown. >>> ii. serializing the calls to orderly_poweroff() seams to be more >>> concerning than cancelling all monitoring. >>> >>> >>
On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: > > > On 04/12/2017 11:44 AM, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>> >>> >>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>> Hello, >>>> >>> ... >>> >>>> >>>> I agree. But there it nothing that says it is not reenterable. If you >>>> saw something in this line, can you please share? >>>> >>>>>>> will you generate a patch to do this? >>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>> orderly_poweroff is called only once right away. I have already >>>>>> tested. >>>>>> >>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>> temperature. >>>>>> I will take some more time to make it and test. >>>>>> >>>>>> Is that okay? Or you want me to send both together? >>>>>> >>>>> I think you can send patch for step 1 first. >>>> >>>> I am happy to see that Keerthy found the problem with his setup and a >>>> possible solution. But I have a few concerns here. >>>> >>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that >>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>> waiting for it. Obviously this not acceptable. Specially if you setup >>>> critical trip to be 125C. Now, if you properly size the critical trip to >>>> fire before hotspot really reach 125C, for 10s (or the time it takes to >>>> shutdown), then fine. But based on what was described in this thread, >>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>> out-of-spec temperature for 10s, which is wrong. >>>> >>>> 2. The above scenario is not acceptable in a long run, specially from a >>>> reliability perspective. If orderly_poweroff() has a possibility to >>>> simply never return (or take too long), I would say the thermal >>>> subsystem is using the wrong API. > > ^ this question just repeat everything which was already discussed in > previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff, > but what to use instead? > > >>>> >>> >>> >>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>> void orderly_poweroff(bool force) >>> { >>> if (force) /* do not override the pending "true" */ >>> poweroff_force = true; >>> schedule_work(&poweroff_work); >>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure >>> } >>> >>> >>> static int __orderly_poweroff(bool force) >>> { >>> int ret; >>> >>> ret = run_cmd(poweroff_cmd); >> >> When i tried with multiple orderly_poweroff calls ret was always 0. >> So every 250mS i see this ret = 0. >> >>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>> >>> if (ret && force) { >> >> So it never entered this path. ret = 0 so if is not executed. > > correct, because exec can find poweroff tool and start it, so you, > most probably, have bunch of this tool instance running in parallel (some of them can fail or block) > Issue 1 - you've sent fix for is actual :). Precisely yes! > > Again, thermal has no control of power off process once run_cmd() is returned, > and it do not know what US poweroff binary is doing and how much time can it take > (which include disks maintenance - loooong delay). > >> >>> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); >>> >>> /* >>> * I guess this should try to kick off some daemon to sync and >>> * poweroff asap. Or not even bother syncing if we're doing an >>> * emergency shutdown? >>> */ >>> emergency_sync(); >>> kernel_power_off(); >>> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist >>> } >>> >>> return ret; >>> } >>> >>> static bool poweroff_force; >>> >>> static void poweroff_work_func(struct work_struct *work) >>> { >>> __orderly_poweroff(poweroff_force); >>> } >>> >>> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result >>> of US poweroff binary execution. >>> >>>> >>>> If you are going to implement the above two patches, keep in mind: >>>> i. At least within the thermal subsystem, you need to take care of all >>>> zones that could trigger a shutdown. >>>> ii. serializing the calls to orderly_poweroff() seams to be more >>>> concerning than cancelling all monitoring. >>>> >>>> >>> >
On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote: > > > On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: > > > > > > On 04/12/2017 11:44 AM, Keerthy wrote: > >> > >> > >> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: > >>> > >>> > >>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: > >>>> Hello, > >>>> > >>> ... > >>> > >>>> > >>>> I agree. But there it nothing that says it is not reenterable. If you > >>>> saw something in this line, can you please share? > >>>> > >>>>>>> will you generate a patch to do this? > >>>>>> Sure. I will generate a patch to take care of 1) To make sure that > >>>>>> orderly_poweroff is called only once right away. I have already > >>>>>> tested. > >>>>>> > >>>>>> for 2) Cancel all the scheduled work queues to monitor the > >>>>>> temperature. > >>>>>> I will take some more time to make it and test. > >>>>>> > >>>>>> Is that okay? Or you want me to send both together? > >>>>>> > >>>>> I think you can send patch for step 1 first. > >>>> > >>>> I am happy to see that Keerthy found the problem with his setup and a > >>>> possible solution. But I have a few concerns here. > >>>> > >>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that > >>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait > >>>> flag set. That means, if regular userland shutdown takes 10s, we are > >>>> waiting for it. Obviously this not acceptable. Specially if you setup > >>>> critical trip to be 125C. Now, if you properly size the critical trip to > >>>> fire before hotspot really reach 125C, for 10s (or the time it takes to > >>>> shutdown), then fine. But based on what was described in this thread, > >>>> his system is waiting 10s on regular shutdown, and his silicon is on > >>>> out-of-spec temperature for 10s, which is wrong. > >>>> > >>>> 2. The above scenario is not acceptable in a long run, specially from a > >>>> reliability perspective. If orderly_poweroff() has a possibility to > >>>> simply never return (or take too long), I would say the thermal > >>>> subsystem is using the wrong API. > > > > ^ this question just repeat everything which was already discussed in > > previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff, > > but what to use instead? It is still useful on a properly sized system. The point is the thermal subsystem still wants to give one opportunity to gracefully shutdown the running system on a thermal scenario, as I explained in the other email. But, you have to do this accounting the down time, and your reliability concerns. > > > > > >>>> > >>> > >>> > >>> Hh, I do not see that orderly_poweroff() will wait for anything now: > >>> void orderly_poweroff(bool force) > >>> { > >>> if (force) /* do not override the pending "true" */ > >>> poweroff_force = true; > >>> schedule_work(&poweroff_work); > >>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure > >>> } > >>> > >>> > >>> static int __orderly_poweroff(bool force) > >>> { > >>> int ret; > >>> > >>> ret = run_cmd(poweroff_cmd); > >> > >> When i tried with multiple orderly_poweroff calls ret was always 0. > >> So every 250mS i see this ret = 0. > >> > >>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC > >>> > >>> if (ret && force) { > >> > >> So it never entered this path. ret = 0 so if is not executed. > > > > correct, because exec can find poweroff tool and start it, so you, > > most probably, have bunch of this tool instance running in parallel (some of them can fail or block) > > Issue 1 - you've sent fix for is actual :). > > Precisely yes! > As I mentioned, the fix is a two fold, a. avoid spam of orderly_poweroff(), but make sure eventually we shutdown. > > > > Again, thermal has no control of power off process once run_cmd() is returned, > > and it do not know what US poweroff binary is doing and how much time can it take > > (which include disks maintenance - loooong delay). > > > >> > >>> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); > >>> > >>> /* > >>> * I guess this should try to kick off some daemon to sync and > >>> * poweroff asap. Or not even bother syncing if we're doing an > >>> * emergency shutdown? > >>> */ > >>> emergency_sync(); > >>> kernel_power_off(); > >>> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist > >>> } > >>> > >>> return ret; > >>> } > >>> > >>> static bool poweroff_force; > >>> > >>> static void poweroff_work_func(struct work_struct *work) > >>> { > >>> __orderly_poweroff(poweroff_force); > >>> } > >>> > >>> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result > >>> of US poweroff binary execution. > >>> > >>>> > >>>> If you are going to implement the above two patches, keep in mind: > >>>> i. At least within the thermal subsystem, you need to take care of all > >>>> zones that could trigger a shutdown. > >>>> ii. serializing the calls to orderly_poweroff() seams to be more > >>>> concerning than cancelling all monitoring. > >>>> > >>>> > >>> > >
On 12/04/17 20:24, Eduardo Valentin wrote: > On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote: >> >> >> On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: >>> >>> >>> On 04/12/2017 11:44 AM, Keerthy wrote: >>>> >>>> >>>> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>>>> >>>>> >>>>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>>>> Hello, >>>>>> >>>>> ... >>>>> >>>>>> >>>>>> I agree. But there it nothing that says it is not reenterable. If you >>>>>> saw something in this line, can you please share? >>>>>> >>>>>>>>> will you generate a patch to do this? >>>>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>>>> orderly_poweroff is called only once right away. I have already >>>>>>>> tested. >>>>>>>> >>>>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>>>> temperature. >>>>>>>> I will take some more time to make it and test. >>>>>>>> >>>>>>>> Is that okay? Or you want me to send both together? >>>>>>>> >>>>>>> I think you can send patch for step 1 first. >>>>>> >>>>>> I am happy to see that Keerthy found the problem with his setup and a >>>>>> possible solution. But I have a few concerns here. >>>>>> >>>>>> 1. If regular shutdown process takes 10seconds, that is a ballpark that >>>>>> thermal should never wait. orderly_poweroff() calls run_cmd() with wait >>>>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>>>> waiting for it. Obviously this not acceptable. Specially if you setup >>>>>> critical trip to be 125C. Now, if you properly size the critical trip to >>>>>> fire before hotspot really reach 125C, for 10s (or the time it takes to >>>>>> shutdown), then fine. But based on what was described in this thread, >>>>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>>>> out-of-spec temperature for 10s, which is wrong. >>>>>> >>>>>> 2. The above scenario is not acceptable in a long run, specially from a >>>>>> reliability perspective. If orderly_poweroff() has a possibility to >>>>>> simply never return (or take too long), I would say the thermal >>>>>> subsystem is using the wrong API. >>> >>> ^ this question just repeat everything which was already discussed in >>> previous versions of this patch - orderly_poweroff() is not good for critical shutdown/poweroff, >>> but what to use instead? > > It is still useful on a properly sized system. The point is the thermal > subsystem still wants to give one opportunity to gracefully shutdown the > running system on a thermal scenario, as I explained in the other email. > But, you have to do this accounting the down time, and your reliability > concerns. > >>> >>> >>>>>> >>>>> >>>>> >>>>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>>>> void orderly_poweroff(bool force) >>>>> { >>>>> if (force) /* do not override the pending "true" */ >>>>> poweroff_force = true; >>>>> schedule_work(&poweroff_work); >>>>> ^^^^^^^ async call. even here can be pretty big delay if system is under pressure >>>>> } >>>>> >>>>> >>>>> static int __orderly_poweroff(bool force) >>>>> { >>>>> int ret; >>>>> >>>>> ret = run_cmd(poweroff_cmd); >>>> >>>> When i tried with multiple orderly_poweroff calls ret was always 0. >>>> So every 250mS i see this ret = 0. >>>> >>>>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>>>> >>>>> if (ret && force) { >>>> >>>> So it never entered this path. ret = 0 so if is not executed. >>> >>> correct, because exec can find poweroff tool and start it, so you, >>> most probably, have bunch of this tool instance running in parallel (some of them can fail or block) >>> Issue 1 - you've sent fix for is actual :). >> >> Precisely yes! >> > > As I mentioned, the fix is a two fold, a. avoid spam of > orderly_poweroff(), but make sure eventually we shutdown. Just chirping in here a bit myself also, the long latencies in the poweroff executing are basically because in our case it will do all of the following: - stop all running daemons - kill all remaining processes - unload all modules - sync / unmount all filesystems - etc. - poweroff the system when everything else has been gracefully done The order of these things are not necessarily what I listed above, but overall it takes quite a bit of time. It doesn't matter if you execute all of this over NFS or SD card or ramdisk, it is a long procedure. -Tero > >>> >>> Again, thermal has no control of power off process once run_cmd() is returned, >>> and it do not know what US poweroff binary is doing and how much time can it take >>> (which include disks maintenance - loooong delay). >>> >>>> >>>>> pr_warn("Failed to start orderly shutdown: forcing the issue\n"); >>>>> >>>>> /* >>>>> * I guess this should try to kick off some daemon to sync and >>>>> * poweroff asap. Or not even bother syncing if we're doing an >>>>> * emergency shutdown? >>>>> */ >>>>> emergency_sync(); >>>>> kernel_power_off(); >>>>> ^^^ force power off, but only if run_cmd() failed - for example /sbin/poweroff doesn't exist >>>>> } >>>>> >>>>> return ret; >>>>> } >>>>> >>>>> static bool poweroff_force; >>>>> >>>>> static void poweroff_work_func(struct work_struct *work) >>>>> { >>>>> __orderly_poweroff(poweroff_force); >>>>> } >>>>> >>>>> As result thermal has no control of power off any more after calling orderly_poweroff() and can get the result >>>>> of US poweroff binary execution. >>>>> >>>>>> >>>>>> If you are going to implement the above two patches, keep in mind: >>>>>> i. At least within the thermal subsystem, you need to take care of all >>>>>> zones that could trigger a shutdown. >>>>>> ii. serializing the calls to orderly_poweroff() seams to be more >>>>>> concerning than cancelling all monitoring. >>>>>> >>>>>> >>>>> >>>
On Thursday 13 April 2017 12:13 AM, Tero Kristo wrote: > On 12/04/17 20:24, Eduardo Valentin wrote: >> On Wed, Apr 12, 2017 at 10:41:00PM +0530, Keerthy wrote: >>> >>> >>> On Wednesday 12 April 2017 10:38 PM, Grygorii Strashko wrote: >>>> >>>> >>>> On 04/12/2017 11:44 AM, Keerthy wrote: >>>>> >>>>> >>>>> On Wednesday 12 April 2017 10:01 PM, Grygorii Strashko wrote: >>>>>> >>>>>> >>>>>> On 04/12/2017 10:44 AM, Eduardo Valentin wrote: >>>>>>> Hello, >>>>>>> >>>>>> ... >>>>>> >>>>>>> >>>>>>> I agree. But there it nothing that says it is not reenterable. If >>>>>>> you >>>>>>> saw something in this line, can you please share? >>>>>>> >>>>>>>>>> will you generate a patch to do this? >>>>>>>>> Sure. I will generate a patch to take care of 1) To make sure that >>>>>>>>> orderly_poweroff is called only once right away. I have already >>>>>>>>> tested. >>>>>>>>> >>>>>>>>> for 2) Cancel all the scheduled work queues to monitor the >>>>>>>>> temperature. >>>>>>>>> I will take some more time to make it and test. >>>>>>>>> >>>>>>>>> Is that okay? Or you want me to send both together? >>>>>>>>> >>>>>>>> I think you can send patch for step 1 first. >>>>>>> >>>>>>> I am happy to see that Keerthy found the problem with his setup >>>>>>> and a >>>>>>> possible solution. But I have a few concerns here. >>>>>>> >>>>>>> 1. If regular shutdown process takes 10seconds, that is a >>>>>>> ballpark that >>>>>>> thermal should never wait. orderly_poweroff() calls run_cmd() >>>>>>> with wait >>>>>>> flag set. That means, if regular userland shutdown takes 10s, we are >>>>>>> waiting for it. Obviously this not acceptable. Specially if you >>>>>>> setup >>>>>>> critical trip to be 125C. Now, if you properly size the critical >>>>>>> trip to >>>>>>> fire before hotspot really reach 125C, for 10s (or the time it >>>>>>> takes to >>>>>>> shutdown), then fine. But based on what was described in this >>>>>>> thread, >>>>>>> his system is waiting 10s on regular shutdown, and his silicon is on >>>>>>> out-of-spec temperature for 10s, which is wrong. >>>>>>> >>>>>>> 2. The above scenario is not acceptable in a long run, specially >>>>>>> from a >>>>>>> reliability perspective. If orderly_poweroff() has a possibility to >>>>>>> simply never return (or take too long), I would say the thermal >>>>>>> subsystem is using the wrong API. >>>> >>>> ^ this question just repeat everything which was already discussed in >>>> previous versions of this patch - orderly_poweroff() is not good for >>>> critical shutdown/poweroff, >>>> but what to use instead? >> >> It is still useful on a properly sized system. The point is the thermal >> subsystem still wants to give one opportunity to gracefully shutdown the >> running system on a thermal scenario, as I explained in the other email. >> But, you have to do this accounting the down time, and your reliability >> concerns. >> >>>> >>>> >>>>>>> >>>>>> >>>>>> >>>>>> Hh, I do not see that orderly_poweroff() will wait for anything now: >>>>>> void orderly_poweroff(bool force) >>>>>> { >>>>>> if (force) /* do not override the pending "true" */ >>>>>> poweroff_force = true; >>>>>> schedule_work(&poweroff_work); >>>>>> ^^^^^^^ async call. even here can be pretty big delay if system is >>>>>> under pressure >>>>>> } >>>>>> >>>>>> >>>>>> static int __orderly_poweroff(bool force) >>>>>> { >>>>>> int ret; >>>>>> >>>>>> ret = run_cmd(poweroff_cmd); >>>>> >>>>> When i tried with multiple orderly_poweroff calls ret was always 0. >>>>> So every 250mS i see this ret = 0. >>>>> >>>>>> ^^^^ no wait for the process - only for exec. flags == UMH_WAIT_EXEC >>>>>> >>>>>> if (ret && force) { >>>>> >>>>> So it never entered this path. ret = 0 so if is not executed. >>>> >>>> correct, because exec can find poweroff tool and start it, so you, >>>> most probably, have bunch of this tool instance running in parallel >>>> (some of them can fail or block) >>>> Issue 1 - you've sent fix for is actual :). >>> >>> Precisely yes! >>> >> >> As I mentioned, the fix is a two fold, a. avoid spam of >> orderly_poweroff(), but make sure eventually we shutdown. > > Just chirping in here a bit myself also, the long latencies in the > poweroff executing are basically because in our case it will do all of > the following: > > - stop all running daemons > - kill all remaining processes > - unload all modules > - sync / unmount all filesystems > - etc. > - poweroff the system when everything else has been gracefully done > > The order of these things are not necessarily what I listed above, but > overall it takes quite a bit of time. It doesn't matter if you execute > all of this over NFS or SD card or ramdisk, it is a long procedure. Yes. Thanks for the pointers Tero. As i had mentioned, Here is the log on DRA72-EVM with arago filesystem over nfs on the next branch with my patch to restrict orderly_poweroff to one cycle. http://pastebin.ubuntu.com/24371980/ I triggered thermal shutdown by using THERMAL_EMULATION. 5-10S was on a good run and we can see that with a full size file system over nfs its taking about 30+ seconds to orderly_poweroff. I also profiled a poweroff command timing. That also takes more than 20 Seconds. Here is the log: http://pastebin.ubuntu.com/24372012/ As Eduardo pointed out this is pretty long. I had 2 suggestions for that: 1) To decrease the thermal critical threshold below the actual hardware thermal shutdown threshold. 2) To have a thermal_backup shutdown which uses kernel_power_off when a configured time expires after we have triggered orderly_poweroff and directly shuts off the system. Regards, Keerthy > > -Tero > >> >>>> >>>> Again, thermal has no control of power off process once run_cmd() >>>> is returned, >>>> and it do not know what US poweroff binary is doing and how much >>>> time can it take >>>> (which include disks maintenance - loooong delay). >>>> >>>>> >>>>>> pr_warn("Failed to start orderly shutdown: forcing the >>>>>> issue\n"); >>>>>> >>>>>> /* >>>>>> * I guess this should try to kick off some daemon to sync >>>>>> and >>>>>> * poweroff asap. Or not even bother syncing if we're >>>>>> doing an >>>>>> * emergency shutdown? >>>>>> */ >>>>>> emergency_sync(); >>>>>> kernel_power_off(); >>>>>> ^^^ force power off, but only if run_cmd() failed - for example >>>>>> /sbin/poweroff doesn't exist >>>>>> } >>>>>> >>>>>> return ret; >>>>>> } >>>>>> >>>>>> static bool poweroff_force; >>>>>> >>>>>> static void poweroff_work_func(struct work_struct *work) >>>>>> { >>>>>> __orderly_poweroff(poweroff_force); >>>>>> } >>>>>> >>>>>> As result thermal has no control of power off any more after >>>>>> calling orderly_poweroff() and can get the result >>>>>> of US poweroff binary execution. >>>>>> >>>>>>> >>>>>>> If you are going to implement the above two patches, keep in mind: >>>>>>> i. At least within the thermal subsystem, you need to take care >>>>>>> of all >>>>>>> zones that could trigger a shutdown. >>>>>>> ii. serializing the calls to orderly_poweroff() seams to be more >>>>>>> concerning than cancelling all monitoring. >>>>>>> >>>>>>> >>>>>> >>>> >
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig index 0a16cf4..4cc55f9 100644 --- a/drivers/thermal/Kconfig +++ b/drivers/thermal/Kconfig @@ -15,6 +15,19 @@ menuconfig THERMAL if THERMAL +config THERMAL_EMERGENCY_POWEROFF_DELAY_MS + int "Emergency poweroff delay in milli-seconds" + depends on THERMAL + default 0 + help + The number of milliseconds to delay before emergency + poweroff kicks in. The delay should be carefully profiled + so as to give adequate time for orderly_poweroff. In case + of failure of an orderly_poweroff the emergency poweroff + kicks in after the delay has elapsed and shuts down the system. + + If set to 0 poweroff will happen immediately. + config THERMAL_HWMON bool prompt "Expose thermal sensors as hwmon device" diff --git a/drivers/thermal/thermal_core.c b/drivers/thermal/thermal_core.c index 11f0675..dc7fdd4 100644 --- a/drivers/thermal/thermal_core.c +++ b/drivers/thermal/thermal_core.c @@ -322,6 +322,47 @@ static void handle_non_critical_trips(struct thermal_zone_device *tz, def_governor->throttle(tz, trip); } +/** + * emergency_poweroff_func - emergency poweroff work after a known delay + * @work: work_struct associated with the emergency poweroff function + * + * This function is called in very critical situations to force + * a kernel poweroff after a configurable timeout value. + */ +static void emergency_poweroff_func(struct work_struct *work) +{ + /** + * We have reached here after the emergency thermal shutdown + * Waiting period has expired. This means orderly_poweroff has + * not been able to shut off the system for some reason. + * Try to shut down the system immediately using pm_power_off + * if populated + */ + pr_warn("Attempting kernel_power_off\n"); + if (pm_power_off) + pm_power_off(); + + /** + * Worst of the worst case trigger emergency restart + */ + pr_warn("kernel_power_off has failed! Attempting emergency_restart\n"); + emergency_restart(); +} + +static DECLARE_DELAYED_WORK(emergency_poweroff_work, emergency_poweroff_func); + +/** + * emergency_poweroff - Trigger an emergency system poweroff + * + * This may be called from any critical situation to trigger a system shutdown + * after a known period of time. By default the delay is 0 millisecond + */ +void thermal_emergency_poweroff(void) +{ + schedule_delayed_work(&emergency_poweroff_work, + msecs_to_jiffies(CONFIG_THERMAL_EMERGENCY_POWEROFF_DELAY_MS)); +} + static void handle_critical_trips(struct thermal_zone_device *tz, int trip, enum thermal_trip_type trip_type) { @@ -343,6 +384,7 @@ static void handle_critical_trips(struct thermal_zone_device *tz, "critical temperature reached(%d C),shutting down\n", tz->temperature / 1000); orderly_poweroff(true); + thermal_emergency_poweroff(); } }
orderly_poweroff is triggered when a graceful shutdown of system is desired. This may be used in many critical states of the kernel such as when subsystems detects conditions such as critical temperature conditions. However, in certain conditions in system boot up sequences like those in the middle of driver probes being initiated, userspace will be unable to power off the system in a clean manner and leaves the system in a critical state. In cases like these, the /sbin/poweroff will return success (having forked off to attempt powering off the system. However, the system overall will fail to completely poweroff (since other modules will be probed) and the system is still functional with no userspace (since that would have shut itself off). However, there is no clean way of detecting such failure of userspace powering off the system. In such scenarios, it is necessary for a backup workqueue to be able to force a shutdown of the system when orderly shutdown is not successful after a configurable time period. Reported-by: Nishanth Menon <nm@ti.com> Signed-off-by: Keerthy <j-keerthy@ti.com> --- drivers/thermal/Kconfig | 13 +++++++++++++ drivers/thermal/thermal_core.c | 42 ++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 55 insertions(+)