Message ID | 1371067281-655-2-git-send-email-swarren@wwwdotorg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Stephen, On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > Add comments to machine_shutdown()/halt()/power_off()/restart() that > describe their purpose and/or requirements re: CPUs being active/not. > > In machine_shutdown(), replace the call to smp_send_stop() with a call to > disable_nonboot_cpus(). This completely disables all but one CPU, thus > satisfying the requirement that only a single CPU be active for kexec. > Adjust Kconfig dependencies for this change. > > In machine_halt()/power_off()/restart(), call smp_send_stop() directly, > rather than via machine_shutdown(); these functions don't need to > completely de-activate all CPUs using hotplug, but rather just quiesce > them. > > Remove smp_kill_cpus(), and its call from smp_send_stop(). > smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling > smp_ops.cpu_die() on the target CPUs first. At least some implementations > of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra, > for example. Since smp_send_stop() is only used for shutdown, halt, and > power-off, there is no need to attempt any kind of CPU hotplug here. > > Adjust Kconfig to reflect that machine_shutdown() (and hence kexec) > relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee > that hotplug will work, or even that hotplug is implemented for a > particular piece of HW that a multi-platform zImage runs on. Hence, add > error-checking to machine_kexec() to determine whether it did work. [...] > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index 282de48..6e8931c 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -184,30 +184,61 @@ int __init reboot_setup(char *str) > > __setup("reboot=", reboot_setup); > > +/* > + * Called by kexec, immediately prior to machine_kexec(). > + * > + * This must completely disable all secondary CPUs; simply causing those CPUs > + * to execute e.g. a RAM-based pin loop is not sufficient. This allows the > + * kexec'd kernel to use any and all RAM as it sees fit, without having to > + * avoid any code or data used by any SW CPU pin loop. The CPU hotplug > + * functionality embodied in disable_nonboot_cpus() to achieve this. > + */ > void machine_shutdown(void) > { > -#ifdef CONFIG_SMP > - smp_send_stop(); > -#endif > + disable_nonboot_cpus(); > } Any reason not to move this into machine_kexec and leave machine_shutdown as a nop? Anyway, I'm on holiday without internet until Tuesday, so for these two patches: Acked-by: Will Deacon <will.deacon@arm.com> Will
On 06/13/2013 01:45 AM, Will Deacon wrote: > Hi Stephen, > > On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote: >> From: Stephen Warren <swarren@nvidia.com> >> >> Add comments to machine_shutdown()/halt()/power_off()/restart() that >> describe their purpose and/or requirements re: CPUs being active/not. >> >> In machine_shutdown(), replace the call to smp_send_stop() with a call to >> disable_nonboot_cpus(). This completely disables all but one CPU, thus >> satisfying the requirement that only a single CPU be active for kexec. >> Adjust Kconfig dependencies for this change. >> >> In machine_halt()/power_off()/restart(), call smp_send_stop() directly, >> rather than via machine_shutdown(); these functions don't need to >> completely de-activate all CPUs using hotplug, but rather just quiesce >> them. >> >> Remove smp_kill_cpus(), and its call from smp_send_stop(). >> smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling >> smp_ops.cpu_die() on the target CPUs first. At least some implementations >> of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra, >> for example. Since smp_send_stop() is only used for shutdown, halt, and >> power-off, there is no need to attempt any kind of CPU hotplug here. >> >> Adjust Kconfig to reflect that machine_shutdown() (and hence kexec) >> relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee >> that hotplug will work, or even that hotplug is implemented for a >> particular piece of HW that a multi-platform zImage runs on. Hence, add >> error-checking to machine_kexec() to determine whether it did work. >> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c >> +/* >> + * Called by kexec, immediately prior to machine_kexec(). >> + * >> + * This must completely disable all secondary CPUs; simply causing those CPUs >> + * to execute e.g. a RAM-based pin loop is not sufficient. This allows the >> + * kexec'd kernel to use any and all RAM as it sees fit, without having to >> + * avoid any code or data used by any SW CPU pin loop. The CPU hotplug >> + * functionality embodied in disable_nonboot_cpus() to achieve this. >> + */ >> void machine_shutdown(void) >> { >> -#ifdef CONFIG_SMP >> - smp_send_stop(); >> -#endif >> + disable_nonboot_cpus(); >> } > > Any reason not to move this into machine_kexec and leave machine_shutdown > as a nop? Well, the function needs to exist to link the kernel, so I figured it may as well do something:-) Judging by the function names, it seems like machine_shutdown() should put the machine into a state where kexec can work, and machine_kexec() should do the final "jump". That matches doing disable_nonboot_cpus() in here. I don't know much about KEXEC_JUMP, but if it were ever implemented for ARM, I'm not sure we'd want to do the disable_nonboot_cpus() in that case(?), and machine_shutdown() is called if !KEXEC_JUMP, but machine_kexec() is called for both. It also means I get to put that comment right there, which is next to the equivalent commands for machine_halt/power_off/restart, rather than in some completely different place in machine_kexec.c. > Anyway, I'm on holiday without internet until Tuesday, so for these two > patches: > > Acked-by: Will Deacon <will.deacon@arm.com> Thanks! Russell, does this look good to you? I assume I should put this into the ARM patch tracker.
On Thu, Jun 13, 2013 at 08:45:51AM +0100, Will Deacon wrote: > Hi Stephen, > > On Wed, Jun 12, 2013 at 09:01:21PM +0100, Stephen Warren wrote: > > void machine_shutdown(void) > > { > > -#ifdef CONFIG_SMP > > - smp_send_stop(); > > -#endif > > + disable_nonboot_cpus(); > > } > > Any reason not to move this into machine_kexec and leave machine_shutdown > as a nop? If you take a peek at kernel/kexec.c, machine_shutdown() is only invoked if we haven't frozen processes and already disabled via hotplug the other CPUs - which will have already been done via disable_nonboot_cpus() in that path. So moving it to kexec would mean that path calls disable_nonboot_cpus() twice, which probably isn't a good thing.
On Thu, Jun 13, 2013 at 08:59:40AM -0600, Stephen Warren wrote: > Russell, does this look good to you? I assume I should put this into the > ARM patch tracker. I think it's fine. It would be nice if zhangfei gao would test these patches and confirm whether they fix it there as well.
On Fri, Jun 14, 2013 at 1:11 AM, Russell King - ARM Linux <linux@arm.linux.org.uk> wrote: > On Thu, Jun 13, 2013 at 08:59:40AM -0600, Stephen Warren wrote: >> Russell, does this look good to you? I assume I should put this into the >> ARM patch tracker. > > I think it's fine. It would be nice if zhangfei gao would test these > patches and confirm whether they fix it there as well. It also works here, pass two hours stress test of panic -> reboot. Tested-by: Zhangfei Gao <zhangfei.gao@gmail.com>
On 06/12/2013 02:01 PM, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > Add comments to machine_shutdown()/halt()/power_off()/restart() that > describe their purpose and/or requirements re: CPUs being active/not. > > In machine_shutdown(), replace the call to smp_send_stop() with a call to > disable_nonboot_cpus(). This completely disables all but one CPU, thus > satisfying the requirement that only a single CPU be active for kexec. > Adjust Kconfig dependencies for this change. > > In machine_halt()/power_off()/restart(), call smp_send_stop() directly, > rather than via machine_shutdown(); these functions don't need to > completely de-activate all CPUs using hotplug, but rather just quiesce > them. > > Remove smp_kill_cpus(), and its call from smp_send_stop(). > smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling > smp_ops.cpu_die() on the target CPUs first. At least some implementations > of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra, > for example. Since smp_send_stop() is only used for shutdown, halt, and > power-off, there is no need to attempt any kind of CPU hotplug here. > > Adjust Kconfig to reflect that machine_shutdown() (and hence kexec) > relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee > that hotplug will work, or even that hotplug is implemented for a > particular piece of HW that a multi-platform zImage runs on. Hence, add > error-checking to machine_kexec() to determine whether it did work. Russell, The patch which initially triggered the problem [shutdown/reboot hangs on Tegra] (cf7df37 "reboot: rigrate shutdown/reboot to boot cpu") ended up going into v3.10; I assumed it was only going into v3.11. Is it possible to take this patch for v3.10 rather than v3.11? (or is your git-curr branch for 3.10; that's where your patchd told me this was applied.)
On Mon, Jun 17, 2013 at 12:58:53PM -0600, Stephen Warren wrote: > On 06/12/2013 02:01 PM, Stephen Warren wrote: > > From: Stephen Warren <swarren@nvidia.com> > > > > Add comments to machine_shutdown()/halt()/power_off()/restart() that > > describe their purpose and/or requirements re: CPUs being active/not. > > > > In machine_shutdown(), replace the call to smp_send_stop() with a call to > > disable_nonboot_cpus(). This completely disables all but one CPU, thus > > satisfying the requirement that only a single CPU be active for kexec. > > Adjust Kconfig dependencies for this change. > > > > In machine_halt()/power_off()/restart(), call smp_send_stop() directly, > > rather than via machine_shutdown(); these functions don't need to > > completely de-activate all CPUs using hotplug, but rather just quiesce > > them. > > > > Remove smp_kill_cpus(), and its call from smp_send_stop(). > > smp_kill_cpus() was indirectly calling smp_ops.cpu_kill() without calling > > smp_ops.cpu_die() on the target CPUs first. At least some implementations > > of smp_ops had issues with this; it caused cpu_kill() to hang on Tegra, > > for example. Since smp_send_stop() is only used for shutdown, halt, and > > power-off, there is no need to attempt any kind of CPU hotplug here. > > > > Adjust Kconfig to reflect that machine_shutdown() (and hence kexec) > > relies upon disable_nonboot_cpus(). However, this alone doesn't guarantee > > that hotplug will work, or even that hotplug is implemented for a > > particular piece of HW that a multi-platform zImage runs on. Hence, add > > error-checking to machine_kexec() to determine whether it did work. > > Russell, > > The patch which initially triggered the problem [shutdown/reboot hangs > on Tegra] (cf7df37 "reboot: rigrate shutdown/reboot to boot cpu") ended > up going into v3.10; I assumed it was only going into v3.11. > > Is it possible to take this patch for v3.10 rather than v3.11? (or is > your git-curr branch for 3.10; that's where your patchd told me this was > applied.) The concern I have is that we're now at -rc6, and my "fixes" branch for this time around is getting much larger than previous ones: 5 files changed, 14 insertions(+), 11 deletions(-) 3 files changed, 2 insertions(+), 3 deletions(-) 7 files changed, 45 insertions(+), 5 deletions(-) and it's currently looking like: 7 files changed, 61 insertions(+), 9 deletions(-) yes, not huge, but it's the wrong direction - and consider I've dropped one thing from fixes this morning because they were actually broken... and you're asking me to include this: 4 files changed, 42 insertions(+), 20 deletions(-) into that, which'll make it: 11 files changed, 103 insertions(+), 29 deletions(-) and it really starts to look like things are heading in the wrong direction... especially as far as Linus would be concerned for -rc6... I will try though. :)
diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 42d6ea2..ecab9f7 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -2028,7 +2028,7 @@ config XIP_PHYS_ADDR config KEXEC bool "Kexec system call (EXPERIMENTAL)" - depends on (!SMP || HOTPLUG_CPU) + depends on (!SMP || PM_SLEEP_SMP) help kexec is a system call that implements the ability to shutdown your current kernel, and to start another kernel. It is like a reboot diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c index 8ef8c93..4fb074c 100644 --- a/arch/arm/kernel/machine_kexec.c +++ b/arch/arm/kernel/machine_kexec.c @@ -134,6 +134,10 @@ void machine_kexec(struct kimage *image) unsigned long reboot_code_buffer_phys; void *reboot_code_buffer; + if (num_online_cpus() > 1) { + pr_err("kexec: error: multiple CPUs still online\n"); + return; + } page_list = image->head & PAGE_MASK; diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index 282de48..6e8931c 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -184,30 +184,61 @@ int __init reboot_setup(char *str) __setup("reboot=", reboot_setup); +/* + * Called by kexec, immediately prior to machine_kexec(). + * + * This must completely disable all secondary CPUs; simply causing those CPUs + * to execute e.g. a RAM-based pin loop is not sufficient. This allows the + * kexec'd kernel to use any and all RAM as it sees fit, without having to + * avoid any code or data used by any SW CPU pin loop. The CPU hotplug + * functionality embodied in disable_nonboot_cpus() to achieve this. + */ void machine_shutdown(void) { -#ifdef CONFIG_SMP - smp_send_stop(); -#endif + disable_nonboot_cpus(); } +/* + * Halting simply requires that the secondary CPUs stop performing any + * activity (executing tasks, handling interrupts). smp_send_stop() + * achieves this. + */ void machine_halt(void) { - machine_shutdown(); + smp_send_stop(); + local_irq_disable(); while (1); } +/* + * Power-off simply requires that the secondary CPUs stop performing any + * activity (executing tasks, handling interrupts). smp_send_stop() + * achieves this. When the system power is turned off, it will take all CPUs + * with it. + */ void machine_power_off(void) { - machine_shutdown(); + smp_send_stop(); + if (pm_power_off) pm_power_off(); } +/* + * Restart requires that the secondary CPUs stop performing any activity + * while the primary CPU resets the system. Systems with a single CPU can + * use soft_restart() as their machine descriptor's .restart hook, since that + * will cause the only available CPU to reset. Systems with multiple CPUs must + * provide a HW restart implementation, to ensure that all CPUs reset at once. + * This is required so that any code running after reset on the primary CPU + * doesn't have to co-ordinate with other CPUs to ensure they aren't still + * executing pre-reset code, and using RAM that the primary CPU's code wishes + * to use. Implementing such co-ordination would be essentially impossible. + */ void machine_restart(char *cmd) { - machine_shutdown(); + smp_send_stop(); arm_pm_restart(reboot_mode, cmd); diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c index 550d63c..5919eb4 100644 --- a/arch/arm/kernel/smp.c +++ b/arch/arm/kernel/smp.c @@ -651,17 +651,6 @@ void smp_send_reschedule(int cpu) smp_cross_call(cpumask_of(cpu), IPI_RESCHEDULE); } -#ifdef CONFIG_HOTPLUG_CPU -static void smp_kill_cpus(cpumask_t *mask) -{ - unsigned int cpu; - for_each_cpu(cpu, mask) - platform_cpu_kill(cpu); -} -#else -static void smp_kill_cpus(cpumask_t *mask) { } -#endif - void smp_send_stop(void) { unsigned long timeout; @@ -679,8 +668,6 @@ void smp_send_stop(void) if (num_online_cpus() > 1) pr_warning("SMP: failed to stop secondary CPUs\n"); - - smp_kill_cpus(&mask); } /*