Message ID | 1370597810-1153-1-git-send-email-josephl@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/07/2013 03:36 AM, Joseph Lo wrote: > The normal CPU hotplug flow in kernel and the flow for Tegra we expected, > is checking the CPU ID is OK for hotplug by "tegra_cpu_disable", the CPU > that would be hotplugged runs into a power-gate state by "tegra_cpu_die", > then the other CPU waits for the CPU that was hotplugged in reset and > clock gate it by "tegra_cpu_kill". That means we don't support the CPU > being stopped or put into offline by trigger "tegra_cpu_kill" directly. > It may cause a busy loop for waiting CPU in reset. > > After the commit "62e930e reboot: rigrate shutdown/reboot to boot cpu", > we remove "disable_nonboot_cpus" when kernel_{restart,halt,power_off}. > But the ARM kernel trigger "send_smp_stop" when machine_shutdown, that > would cause the "tegra_cpu_kill" directly without "tegra_cpu_die" first. > > We hook "disable_nonboot_cpus" in "reboot_notifier" to avoid that happens. > And it can work for reboot, shutdown, halt and kexec. I don't believe this is the correct solution. If the semantics of cpu_kill/cpu_die are such that it's legal to call only cpu_kill without having cause cpu_die to run on the killed CPU first, then Tegra's implementation is buggy. We should simply fix that, rather than avoiding this by forcing a different order for the calls to cpu_kill/cpu_die. If the semantics of cpu_kill/cpu_die are such that one /must/ cause cpu_die to run on the killed CPU before cpu_kill can be used on it, then there's a bug in the code that isn't doing that. I'm CCing a few people in an attempt to find out exactly what the expected semantics are for cpu_kill/cpu_die; is it legal to call cpu_kill without having first caused cpu_die to execute?
On Fri, Jun 07, 2013 at 05:44:33PM +0100, Stephen Warren wrote: > On 06/07/2013 03:36 AM, Joseph Lo wrote: > > The normal CPU hotplug flow in kernel and the flow for Tegra we expected, > > is checking the CPU ID is OK for hotplug by "tegra_cpu_disable", the CPU > > that would be hotplugged runs into a power-gate state by "tegra_cpu_die", > > then the other CPU waits for the CPU that was hotplugged in reset and > > clock gate it by "tegra_cpu_kill". That means we don't support the CPU > > being stopped or put into offline by trigger "tegra_cpu_kill" directly. > > It may cause a busy loop for waiting CPU in reset. > > > > After the commit "62e930e reboot: rigrate shutdown/reboot to boot cpu", > > we remove "disable_nonboot_cpus" when kernel_{restart,halt,power_off}. > > But the ARM kernel trigger "send_smp_stop" when machine_shutdown, that > > would cause the "tegra_cpu_kill" directly without "tegra_cpu_die" first. > > > > We hook "disable_nonboot_cpus" in "reboot_notifier" to avoid that happens. > > And it can work for reboot, shutdown, halt and kexec. > > I don't believe this is the correct solution. > > If the semantics of cpu_kill/cpu_die are such that it's legal to call > only cpu_kill without having cause cpu_die to run on the killed CPU > first, then Tegra's implementation is buggy. We should simply fix that, > rather than avoiding this by forcing a different order for the calls to > cpu_kill/cpu_die. > > If the semantics of cpu_kill/cpu_die are such that one /must/ cause > cpu_die to run on the killed CPU before cpu_kill can be used on it, then > there's a bug in the code that isn't doing that. > > I'm CCing a few people in an attempt to find out exactly what the > expected semantics are for cpu_kill/cpu_die; is it legal to call > cpu_kill without having first caused cpu_die to execute? By cpu_kill, do you mean platform_cpu_kill called from __cpu_die? If so, __cpu_die and cpu_die are definitely supposed to be treated as a pair, since they synchronise via the cpu_died completion. Will
On 06/07/2013 12:18 PM, Will Deacon wrote: > On Fri, Jun 07, 2013 at 05:44:33PM +0100, Stephen Warren wrote: >> On 06/07/2013 03:36 AM, Joseph Lo wrote: >>> The normal CPU hotplug flow in kernel and the flow for Tegra we expected, >>> is checking the CPU ID is OK for hotplug by "tegra_cpu_disable", the CPU >>> that would be hotplugged runs into a power-gate state by "tegra_cpu_die", >>> then the other CPU waits for the CPU that was hotplugged in reset and >>> clock gate it by "tegra_cpu_kill". That means we don't support the CPU >>> being stopped or put into offline by trigger "tegra_cpu_kill" directly. >>> It may cause a busy loop for waiting CPU in reset. >>> >>> After the commit "62e930e reboot: rigrate shutdown/reboot to boot cpu", >>> we remove "disable_nonboot_cpus" when kernel_{restart,halt,power_off}. >>> But the ARM kernel trigger "send_smp_stop" when machine_shutdown, that >>> would cause the "tegra_cpu_kill" directly without "tegra_cpu_die" first. >>> >>> We hook "disable_nonboot_cpus" in "reboot_notifier" to avoid that happens. >>> And it can work for reboot, shutdown, halt and kexec. >> >> I don't believe this is the correct solution. >> >> If the semantics of cpu_kill/cpu_die are such that it's legal to call >> only cpu_kill without having cause cpu_die to run on the killed CPU >> first, then Tegra's implementation is buggy. We should simply fix that, >> rather than avoiding this by forcing a different order for the calls to >> cpu_kill/cpu_die. >> >> If the semantics of cpu_kill/cpu_die are such that one /must/ cause >> cpu_die to run on the killed CPU before cpu_kill can be used on it, then >> there's a bug in the code that isn't doing that. >> >> I'm CCing a few people in an attempt to find out exactly what the >> expected semantics are for cpu_kill/cpu_die; is it legal to call >> cpu_kill without having first caused cpu_die to execute? > > By cpu_kill, do you mean platform_cpu_kill called from __cpu_die? The struct smp_operations .cpu_kill/.cpu_die hooks. So, yes. > If so, > __cpu_die and cpu_die are definitely supposed to be treated as a pair, since > they synchronise via the cpu_died completion. So the analysis I did, cribbed from our internal bug report so hopefully it makes sense without any context there, was: ========== Before that patch (62e930e reboot: "rigrate shutdown/reboot to boot cpu"), kernel/sys.c:kernel_restart() and kernel_power_off() used to use CPU hotplug mechanisms to unplug every CPU other than one CPU, then do the reboot or shutdown. The ARM implementation of machine_restart() and machine_power_off() both call machine_shutdown() which calls smp_send_stop(), which IPIs to every CPU to tell it to stop. However, since all the CPUs were unplugged, this was a no-op. With the patch, the kernel simply disables scheduling on all CPUs except logical CPU 0 in kernel_restart() and kernel_power_off(). This guarantees that the code is running on logical CPU 0, but leaves the other CPUs still present. Hence, the call to smp_send_stop() from the ARM code is no longer a no-op. This code hangs. The implementation of smp_send_stop() raises IPI_CPU_STOP on each CPU (other than logical CPU 0). This eventually calls down to tegra_cpu_kill()[1], which calls tegra_wait_cpu_in_reset() which calls tegra20_wait_cpu_in_reset(). That hangs, because nothing has ever told the flow controller to put the CPU in reset, so logical CPU 0 waits indefinitely for this to happen, which is the hang. ========== [1] Perhaps the issue is why ipi_send_stop() calls down into tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what should be run on the killed CPU, and kill() on the killing CPU?
On 06/07/2013 12:56 PM, Stephen Warren wrote: ... > [1] Perhaps the issue is why ipi_send_stop() calls down into > tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what > should be run on the killed CPU, and kill() on the killing CPU? Scratch that; I don't think it's calling down to /either/; I was confused. It seems like it /should/ call cpu_die() though, at least if hotplug is enabled, right?
On Fri, Jun 07, 2013 at 03:28:38PM -0600, Stephen Warren wrote: > On 06/07/2013 12:56 PM, Stephen Warren wrote: > ... > > [1] Perhaps the issue is why ipi_send_stop() calls down into > > tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what > > should be run on the killed CPU, and kill() on the killing CPU? > > Scratch that; I don't think it's calling down to /either/; I was > confused. It seems like it /should/ call cpu_die() though, at least if > hotplug is enabled, right? The problem is really complex. CPU hotplug is done in paths where we're relatively confident that the system is working correctly. So all the features such as scheduling are available, the timer ticks work and so forth. However, reboot is a totally different environment. This can happen from almost any context with the system in any state what so ever. A CPU could be stuck. A CPU could have oopsed. The CPU which is in the reboot code could be the CPU which has oopsed. It could be called from within an interrupt... What that means is the usual CPU hotplug methods can't be used in the reboot path. Well, they can, but it will be fragile. For reboot, the real solution there is not to use software-based reboot, but bring the other cores to a halt (which is what ipi_send_stop is doing) and then issue a hardware reset to the whole system, including the other CPUs.
On 06/07/2013 04:15 PM, Russell King - ARM Linux wrote: > On Fri, Jun 07, 2013 at 03:28:38PM -0600, Stephen Warren wrote: >> On 06/07/2013 12:56 PM, Stephen Warren wrote: >> ... >>> [1] Perhaps the issue is why ipi_send_stop() calls down into >>> tegra_cpu_kill() rather than tegra_cpu_die(), since die() is what >>> should be run on the killed CPU, and kill() on the killing CPU? >> >> Scratch that; I don't think it's calling down to /either/; I was >> confused. It seems like it /should/ call cpu_die() though, at least if >> hotplug is enabled, right? > > The problem is really complex. > > CPU hotplug is done in paths where we're relatively confident that the > system is working correctly. So all the features such as scheduling > are available, the timer ticks work and so forth. > > However, reboot is a totally different environment. This can happen > from almost any context with the system in any state what so ever. A > CPU could be stuck. A CPU could have oopsed. The CPU which is in > the reboot code could be the CPU which has oopsed. It could be called > from within an interrupt... > > What that means is the usual CPU hotplug methods can't be used in the > reboot path. Well, they can, but it will be fragile. > > For reboot, the real solution there is not to use software-based > reboot, but bring the other cores to a halt (which is what > ipi_send_stop is doing) and then issue a hardware reset to the whole > system, including the other CPUs. Ignoring the issues with oops in reboot, I think there's a bug in that when hotplug is enabled, smp_kill_cpus() calls platform_cpu_kill(), but nothing causes the failing CPU to ever execute smp_ops.cpu_die(). Hence, if the implementation of smp_ops.cpu_kill() relies on the target CPU having run smp_ops.cpu_die(), then smp_ops.cpu_kill() may not operate correctly. Or, must smp_ops.cpu_kill() not assume that smp_ops.cpu_die() will be called on the target CPU? What are the semantics here? Will mentioned that __cpu_die and cpu_die are a pair, but what about is the smp_ops are used directly; are they also supposed to be a pair? The change below solves the pairing issue, by making ipi_cpu_stop() perform the low-level part of hotplug that matches what smp_kill_cpus() call to platform_cpu_kill(). This certainly fixes the hang-in-reboot-or-shutdown problem on Tegra. > diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c > index 550d63c..541f667 100644 > --- a/arch/arm/kernel/smp.c > +++ b/arch/arm/kernel/smp.c > @@ -581,11 +581,20 @@ static void ipi_cpu_stop(unsigned int cpu) > > set_cpu_online(cpu, false); > > +#ifdef CONFIG_HOTPLUG_CPU > +#if 0 > + arch_cpu_idle_dead(); > +#else > + /* The body of arch_cpu_idle_dead() - which is better? */ > + cpu_die(); > +#endif > +#else > local_fiq_disable(); > local_irq_disable(); > > while (1) > cpu_relax(); > +#endif > } Some things I'm not sure of here: * cpu_die() calls idle_task_exit(). That's probably wrong if it's triggered from an IPI; who knows what task it's executing. That said, if migrate_to_reboot_cpu() did set_cpus_allowed_ptr(current, cpumask_of(cpu)), perhaps that guarantees the CPU is running the idle task since there's nothing else that could be running? * ipi_cpu_stop() currently calls local_fiq_disable(), but cpu_die() doesn't. Should both functions call both local_fiq_disable() and local_irq_disable()? * Perhaps smp_kill_cpus() should also be changed, to call cpu_die() not platform_cpu_kill(), to keep the pairing correct at that level too. Plus, I ignored any issues you raised for the oops case on reboot...
On Fri, Jun 07, 2013 at 04:39:32PM -0600, Stephen Warren wrote: > On 06/07/2013 04:15 PM, Russell King - ARM Linux wrote: > > For reboot, the real solution there is not to use software-based > > reboot, but bring the other cores to a halt (which is what > > ipi_send_stop is doing) and then issue a hardware reset to the whole > > system, including the other CPUs. > > Ignoring the issues with oops in reboot, I think there's a bug in that > when hotplug is enabled, smp_kill_cpus() calls platform_cpu_kill(), but > nothing causes the failing CPU to ever execute smp_ops.cpu_die(). Hence, > if the implementation of smp_ops.cpu_kill() relies on the target CPU > having run smp_ops.cpu_die(), then smp_ops.cpu_kill() may not operate > correctly. Well, smp_kill_cpus() was added to get around the kexec problem - transitioning from one kernel to the next kernel without going through a hardware reset. Maybe if we take a step back... 1. remove smp_kill_cpus() from smp_send_stop(). 2. remove machine_shutdown() from machine_halt(), machine_power_off() and machine_restart(). 3. call smp_send_stop() only from machine_halt(), machine_power_off() and machine_restart() 4. require a hardware-based reboot method for all SMP implementations; using soft_reboot() is not an option. This should get us into the situation where we have a reliable method of halting and rebooting the kernel everywhere, leaving kexec as being the remaining problem case. Currently, for that we effectively do smp_send_stop() followed by smp_kill_cpus(). The no-op change for kexec there is to allow smp_kill_cpus() to be called directly from machine_shutdown() - but I suspect there will still be stuff that's broken with that... So the ongoing problem remains - how to deal with kexec in a SMP environment where it's difficult to reliably take a secondary CPU offline to a safe place and then be able to restart it into the next kernel...
On Fri, Jun 07, 2013 at 11:55:12PM +0100, Russell King - ARM Linux wrote: > On Fri, Jun 07, 2013 at 04:39:32PM -0600, Stephen Warren wrote: > > On 06/07/2013 04:15 PM, Russell King - ARM Linux wrote: > > > For reboot, the real solution there is not to use software-based > > > reboot, but bring the other cores to a halt (which is what > > > ipi_send_stop is doing) and then issue a hardware reset to the whole > > > system, including the other CPUs. > > > > Ignoring the issues with oops in reboot, I think there's a bug in that > > when hotplug is enabled, smp_kill_cpus() calls platform_cpu_kill(), but > > nothing causes the failing CPU to ever execute smp_ops.cpu_die(). Hence, > > if the implementation of smp_ops.cpu_kill() relies on the target CPU > > having run smp_ops.cpu_die(), then smp_ops.cpu_kill() may not operate > > correctly. > > Well, smp_kill_cpus() was added to get around the kexec problem - > transitioning from one kernel to the next kernel without going through > a hardware reset. Maybe if we take a step back... > > 1. remove smp_kill_cpus() from smp_send_stop(). > 2. remove machine_shutdown() from machine_halt(), machine_power_off() > and machine_restart(). > 3. call smp_send_stop() only from machine_halt(), machine_power_off() and > machine_restart() > 4. require a hardware-based reboot method for all SMP implementations; > using soft_reboot() is not an option. > > This should get us into the situation where we have a reliable method of > halting and rebooting the kernel everywhere, leaving kexec as being the > remaining problem case. > > Currently, for that we effectively do smp_send_stop() followed by > smp_kill_cpus(). The no-op change for kexec there is to allow > smp_kill_cpus() to be called directly from machine_shutdown() - but > I suspect there will still be stuff that's broken with that... > > So the ongoing problem remains - how to deal with kexec in a SMP > environment where it's difficult to reliably take a secondary CPU > offline to a safe place and then be able to restart it into the > next kernel... For kexec, I think it's perfectly reasonable to mandate hardware-based offlining for the secondary cores (hence the half-hearted dependency on HOTPLUG_CPU). In that case, the only guy that has to go down the soft reboot path is the primary CPU which shouldn't be too problematic, right? Supporting sort-reboot of secondaries is a total PITA, even if you have some `safe place' to put them. You still have to synchronise with non-coherent cores so that you know when it's safe to clobber the old image, which requires complex locking algorithms and a prevailing wind. Will
diff --git a/arch/arm/mach-tegra/hotplug.c b/arch/arm/mach-tegra/hotplug.c index a52c10e..1676669 100644 --- a/arch/arm/mach-tegra/hotplug.c +++ b/arch/arm/mach-tegra/hotplug.c @@ -10,6 +10,8 @@ #include <linux/kernel.h> #include <linux/smp.h> #include <linux/clk/tegra.h> +#include <linux/cpu.h> +#include <linux/reboot.h> #include <asm/smp_plat.h> @@ -57,6 +59,21 @@ int tegra_cpu_disable(unsigned int cpu) } } +/* + * reboot notifier for avoid busy loop when trigger tegra_cpu_kill + * without tegra_cpu_die first + */ +static int tegra_cpu_kill_notify(struct notifier_block *nb, + unsigned long action, void *data) +{ + disable_nonboot_cpus(); + return NOTIFY_DONE; +} + +static struct notifier_block tegra_cpu_kill_nb = { + .notifier_call = tegra_cpu_kill_notify, +}; + void __init tegra_hotplug_init(void) { if (!IS_ENABLED(CONFIG_HOTPLUG_CPU)) @@ -68,4 +85,6 @@ void __init tegra_hotplug_init(void) tegra_hotplug_shutdown = tegra30_hotplug_shutdown; if (IS_ENABLED(CONFIG_ARCH_TEGRA_114_SOC) && tegra_chip_id == TEGRA114) tegra_hotplug_shutdown = tegra30_hotplug_shutdown; + + register_reboot_notifier(&tegra_cpu_kill_nb); }
The normal CPU hotplug flow in kernel and the flow for Tegra we expected, is checking the CPU ID is OK for hotplug by "tegra_cpu_disable", the CPU that would be hotplugged runs into a power-gate state by "tegra_cpu_die", then the other CPU waits for the CPU that was hotplugged in reset and clock gate it by "tegra_cpu_kill". That means we don't support the CPU being stopped or put into offline by trigger "tegra_cpu_kill" directly. It may cause a busy loop for waiting CPU in reset. After the commit "62e930e reboot: rigrate shutdown/reboot to boot cpu", we remove "disable_nonboot_cpus" when kernel_{restart,halt,power_off}. But the ARM kernel trigger "send_smp_stop" when machine_shutdown, that would cause the "tegra_cpu_kill" directly without "tegra_cpu_die" first. We hook "disable_nonboot_cpus" in "reboot_notifier" to avoid that happens. And it can work for reboot, shutdown, halt and kexec. Signed-off-by: Joseph Lo <josephl@nvidia.com> --- arch/arm/mach-tegra/hotplug.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+)