diff mbox

kexec: disable non-boot CPUs

Message ID 1355960681-32015-1-git-send-email-swarren@wwwdotorg.org (mailing list archive)
State New, archived
Headers show

Commit Message

Stephen Warren Dec. 19, 2012, 11:44 p.m. UTC
From: Stephen Warren <swarren@nvidia.com>

Both the regular reboot and shutdown paths do this. It seems reasonable
for kexec to work the same way.

On the Tegra ARM SoC at least, this change allows kexec to work with SMP
enabled. ARM's machine_shutdown() simply puts all CPUs into a loop. If
the code of that loop is over-written, the CPUs may crash, and cause the
kexec'd kernel not to be able to initialize them. In practice, this
causes the kexec'd kernel to hang and/or crash. The intended way to
solve this is for ARM machines to provide a cpu_kill SMP operation to
e.g. power down the CPUs, or place them in reset. However, at least on
Tegra, the implementation of that function would simply be duplicating
the hotplug code that already exists, so it seems simpler to just call
disable_nonboot_cpus() for the kexec path, just like reboot/shutdown.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
 kernel/kexec.c |    1 +
 1 file changed, 1 insertion(+)

Comments

Eric W. Biederman Dec. 19, 2012, 11:55 p.m. UTC | #1
Stephen Warren <swarren@wwwdotorg.org> writes:

> From: Stephen Warren <swarren@nvidia.com>
>
> Both the regular reboot and shutdown paths do this. It seems reasonable
> for kexec to work the same way.
>
> On the Tegra ARM SoC at least, this change allows kexec to work with SMP
> enabled. ARM's machine_shutdown() simply puts all CPUs into a loop. If
> the code of that loop is over-written, the CPUs may crash, and cause the
> kexec'd kernel not to be able to initialize them. In practice, this
> causes the kexec'd kernel to hang and/or crash. The intended way to
> solve this is for ARM machines to provide a cpu_kill SMP operation to
> e.g. power down the CPUs, or place them in reset. However, at least on
> Tegra, the implementation of that function would simply be duplicating
> the hotplug code that already exists, so it seems simpler to just call
> disable_nonboot_cpus() for the kexec path, just like reboot/shutdown.

I would like to ack this but I think this is one of those differences
for which there is a reason.

On x86 this happens in machine_shutdown.

Before this can be merged someone needs to look into the history and
see why we don't share this code path.

My hunch is because it was duplication of effort and the stop machine
code has historically been fragile.

Eric



> Signed-off-by: Stephen Warren <swarren@nvidia.com>
> ---
>  kernel/kexec.c |    1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/kernel/kexec.c b/kernel/kexec.c
> index 5e4bd78..6fe74d3 100644
> --- a/kernel/kexec.c
> +++ b/kernel/kexec.c
> @@ -1570,6 +1570,7 @@ int kernel_kexec(void)
>  #endif
>  	{
>  		kernel_restart_prepare(NULL);
> +		disable_nonboot_cpus();
>  		printk(KERN_EMERG "Starting new kernel\n");
>  		machine_shutdown();
>  	}
Will Deacon Dec. 20, 2012, 10:49 a.m. UTC | #2
Hi Stephen,

On Wed, Dec 19, 2012 at 11:44:41PM +0000, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Both the regular reboot and shutdown paths do this. It seems reasonable
> for kexec to work the same way.
> 
> On the Tegra ARM SoC at least, this change allows kexec to work with SMP
> enabled. ARM's machine_shutdown() simply puts all CPUs into a loop. If
> the code of that loop is over-written, the CPUs may crash, and cause the
> kexec'd kernel not to be able to initialize them. In practice, this
> causes the kexec'd kernel to hang and/or crash. The intended way to
> solve this is for ARM machines to provide a cpu_kill SMP operation to
> e.g. power down the CPUs, or place them in reset. However, at least on
> Tegra, the implementation of that function would simply be duplicating
> the hotplug code that already exists, so it seems simpler to just call
> disable_nonboot_cpus() for the kexec path, just like reboot/shutdown.

If you do manage to get this merged, please can you follow up with a patch
to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
exists as a hook to do exactly this and currently nobody is using it afaict.

Cheers,

Will
Stephen Warren Dec. 20, 2012, 5:21 p.m. UTC | #3
On 12/20/2012 03:49 AM, Will Deacon wrote:
> Hi Stephen,
> 
> On Wed, Dec 19, 2012 at 11:44:41PM +0000, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> Both the regular reboot and shutdown paths do this. It seems reasonable
>> for kexec to work the same way.
>>
>> On the Tegra ARM SoC at least, this change allows kexec to work with SMP
>> enabled. ARM's machine_shutdown() simply puts all CPUs into a loop. If
>> the code of that loop is over-written, the CPUs may crash, and cause the
>> kexec'd kernel not to be able to initialize them. In practice, this
>> causes the kexec'd kernel to hang and/or crash. The intended way to
>> solve this is for ARM machines to provide a cpu_kill SMP operation to
>> e.g. power down the CPUs, or place them in reset. However, at least on
>> Tegra, the implementation of that function would simply be duplicating
>> the hotplug code that already exists, so it seems simpler to just call
>> disable_nonboot_cpus() for the kexec path, just like reboot/shutdown.
> 
> If you do manage to get this merged, please can you follow up with a patch
> to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
> exists as a hook to do exactly this and currently nobody is using it afaict.

I originally implemented this in
arch/arm/kernel/process.c:machine_shutdown(), which currently is:

void machine_shutdown(void)
{
#ifdef CONFIG_SMP
	smp_send_stop();
#endif
}

and I changed it to something like:

void machine_shutdown(void)
{
#ifdef CONFIG_HOTPLUG_CPU
 	disable_nonboot_cpus();
#elifdef CONFIG_SMP
	smp_send_stop();
#endif
}

... but then figured that moving it up into the core kexec code would be
better, so that everything always worked the same way.

Anyway, the change above addresses Eric's concern about isolating the
change to ARM. Does that seem like a reasonable thing for the ARM code
to do?
Will Deacon Dec. 20, 2012, 5:36 p.m. UTC | #4
On Thu, Dec 20, 2012 at 05:21:56PM +0000, Stephen Warren wrote:
> On 12/20/2012 03:49 AM, Will Deacon wrote:
> > If you do manage to get this merged, please can you follow up with a patch
> > to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
> > exists as a hook to do exactly this and currently nobody is using it afaict.
> 
> I originally implemented this in
> arch/arm/kernel/process.c:machine_shutdown(), which currently is:
> 
> void machine_shutdown(void)
> {
> #ifdef CONFIG_SMP
> 	smp_send_stop();
> #endif
> }
> 
> and I changed it to something like:
> 
> void machine_shutdown(void)
> {
> #ifdef CONFIG_HOTPLUG_CPU
>  	disable_nonboot_cpus();
> #elifdef CONFIG_SMP
> 	smp_send_stop();
> #endif
> }
> 
> ... but then figured that moving it up into the core kexec code would be
> better, so that everything always worked the same way.

Hmmm, isn't this racy: requiring the secondaries to hit idle and notice
they're offline and call cpu_die before the primary has replace the kernel
image?

> Anyway, the change above addresses Eric's concern about isolating the
> change to ARM. Does that seem like a reasonable thing for the ARM code
> to do?

I think you're better off using what we currently have and hanging your code
off platform_cpu_kill.

Will
Stephen Warren Dec. 20, 2012, 5:59 p.m. UTC | #5
On 12/20/2012 10:36 AM, Will Deacon wrote:
> On Thu, Dec 20, 2012 at 05:21:56PM +0000, Stephen Warren wrote:
>> On 12/20/2012 03:49 AM, Will Deacon wrote:
>>> If you do manage to get this merged, please can you follow up with a patch
>>> to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
>>> exists as a hook to do exactly this and currently nobody is using it afaict.
>>
>> I originally implemented this in
>> arch/arm/kernel/process.c:machine_shutdown(), which currently is:
>>
>> void machine_shutdown(void)
>> {
>> #ifdef CONFIG_SMP
>> 	smp_send_stop();
>> #endif
>> }
>>
>> and I changed it to something like:
>>
>> void machine_shutdown(void)
>> {
>> #ifdef CONFIG_HOTPLUG_CPU
>>  	disable_nonboot_cpus();
>> #elifdef CONFIG_SMP
>> 	smp_send_stop();
>> #endif
>> }
>>
>> ... but then figured that moving it up into the core kexec code would be
>> better, so that everything always worked the same way.
> 
> Hmmm, isn't this racy: requiring the secondaries to hit idle and notice
> they're offline and call cpu_die before the primary has replace the kernel
> image?

Isn't disable_nonboot_cpus() synchronous? If not, I imagine my original
patch wasn't any better in this respect, except that the hotunplug
happened earlier, and hence reduced the likelihood of actually seeing
any such issues.

>> Anyway, the change above addresses Eric's concern about isolating the
>> change to ARM. Does that seem like a reasonable thing for the ARM code
>> to do?
> 
> I think you're better off using what we currently have and hanging your code
> off platform_cpu_kill.

OK, I'll look into that. Joseph Lo just posted patches to implement
cpu_kill() on Tegra, which was needed to fix some issues in our hotplug
code anyway. Perhaps that will remove the need for any other changes...
Stephen Warren Dec. 20, 2012, 8:15 p.m. UTC | #6
On 12/20/2012 10:59 AM, Stephen Warren wrote:
> On 12/20/2012 10:36 AM, Will Deacon wrote:
>> On Thu, Dec 20, 2012 at 05:21:56PM +0000, Stephen Warren wrote:
>>> On 12/20/2012 03:49 AM, Will Deacon wrote:
>>>> If you do manage to get this merged, please can you follow up with a patch
>>>> to remove the smp_kill_cpus bits from arch/arm/kernel/smp.c please? It only
>>>> exists as a hook to do exactly this and currently nobody is using it afaict.
>>>
>>> I originally implemented this in
>>> arch/arm/kernel/process.c:machine_shutdown(), which currently is:
>>>
>>> void machine_shutdown(void)
>>> {
>>> #ifdef CONFIG_SMP
>>> 	smp_send_stop();
>>> #endif
>>> }
>>>
>>> and I changed it to something like:
>>>
>>> void machine_shutdown(void)
>>> {
>>> #ifdef CONFIG_HOTPLUG_CPU
>>>  	disable_nonboot_cpus();
>>> #elifdef CONFIG_SMP
>>> 	smp_send_stop();
>>> #endif
>>> }
>>>
>>> ... but then figured that moving it up into the core kexec code would be
>>> better, so that everything always worked the same way.
>>
>> Hmmm, isn't this racy: requiring the secondaries to hit idle and notice
>> they're offline and call cpu_die before the primary has replace the kernel
>> image?
> 
> Isn't disable_nonboot_cpus() synchronous? If not, I imagine my original
> patch wasn't any better in this respect, except that the hotunplug
> happened earlier, and hence reduced the likelihood of actually seeing
> any such issues.
> 
>>> Anyway, the change above addresses Eric's concern about isolating the
>>> change to ARM. Does that seem like a reasonable thing for the ARM code
>>> to do?
>>
>> I think you're better off using what we currently have and hanging your code
>> off platform_cpu_kill.
> 
> OK, I'll look into that. Joseph Lo just posted patches to implement
> cpu_kill() on Tegra, which was needed to fix some issues in our hotplug
> code anyway. Perhaps that will remove the need for any other changes...

Will,

I just remembered one other advantage of disable_nonboot_cpus(); it
always makes the kexec happen on the boot CPU. Without this, I believe
it's random whether CPU0 or CPU1 performs the kexec. I suspect it's most
likely to work if we can always kexec on the boot CPU rather than a
random CPU?

Joseph, Peter,

As you know, I've been looking into kexec[2] on Tegra. Here's a summary
of a few tests I did:

linux-next plus nothing in particular, SMP enabled:
* Hangs/crashes during kexec

linux-next + SMP disabled: kexec works

linux-next + hotunplug CPUs other than CPU0 before kexec: kexec works

Will Deacon suggested this was due to a missing implementation of struct
smp_operations .cpu_kill on Tegra, which means that when CPUn are
present, they're simply spinning executing code, and kexec will
eventually overwrite that code, causing all manner of problems. So,
since I noticed that Joseph posted an implementation of .cpu_kill for
Tegra, I tried that out[1]. It certainly doesn't solve the problem, and
in fact actually causes the kernel performing the kexec (rather than the
kexec'd kernel) to hang:

> [   26.291903] Starting new kernel
> [   47.309571] INFO: rcu_preempt detected stalls on CPUs/tasks: { 1} (detected by 0, t=2102 jiffies, g=211, c=210, q=37)
> [   47.323410] Task dump for CPU 1:
> [   47.329763] dd              R running      0   401      1 0x00000000
> [   47.339343] [<c0521690>] (__schedule+0x360/0x600) from [<c002d9b0>] (do_syslog+0x2b4/0x478)
> [   47.350952] [<c002d9b0>] (do_syslog+0x2b4/0x478) from [<ed86bb08>] (0xed86bb08)

Manually hotunplugging the CPUs first still works OK with those patches
applied though.

I /think/ kexec calls .cpu_kill() without having caused the CPU itself
to call .cpu_die() first? Did you allow for that possibility inside the
implementation you posted?

[1]
http://patchwork.ozlabs.org/patch/207601/
http://patchwork.ozlabs.org/patch/207602/

[2]
http://en.wikipedia.org/wiki/Kexec
Will Deacon Dec. 23, 2012, 11:06 a.m. UTC | #7
On Thu, Dec 20, 2012 at 08:15:30PM +0000, Stephen Warren wrote:
> On 12/20/2012 10:59 AM, Stephen Warren wrote:
> > On 12/20/2012 10:36 AM, Will Deacon wrote:
> >> I think you're better off using what we currently have and hanging your code
> >> off platform_cpu_kill.
> > 
> > OK, I'll look into that. Joseph Lo just posted patches to implement
> > cpu_kill() on Tegra, which was needed to fix some issues in our hotplug
> > code anyway. Perhaps that will remove the need for any other changes...
> 
> Will,

Hi Stephen,

> I just remembered one other advantage of disable_nonboot_cpus(); it
> always makes the kexec happen on the boot CPU. Without this, I believe
> it's random whether CPU0 or CPU1 performs the kexec. I suspect it's most
> likely to work if we can always kexec on the boot CPU rather than a
> random CPU?

It will be the same CPU as the one which made the kexec system call. If you
need to invoke this from a specific CPU, you currently have to do this in
userspace using something like the `taskset' utility. Linux can actually
boot on any physical CPU for ARM now (thanks to the logical map) but most
platforms have restrictions related to secure firmware.

Will
diff mbox

Patch

diff --git a/kernel/kexec.c b/kernel/kexec.c
index 5e4bd78..6fe74d3 100644
--- a/kernel/kexec.c
+++ b/kernel/kexec.c
@@ -1570,6 +1570,7 @@  int kernel_kexec(void)
 #endif
 	{
 		kernel_restart_prepare(NULL);
+		disable_nonboot_cpus();
 		printk(KERN_EMERG "Starting new kernel\n");
 		machine_shutdown();
 	}