Message ID | 1357160861-26282-1-git-send-email-swarren@wwwdotorg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jan 02, 2013 at 02:07:41PM -0700, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > On Tegra at least, this change allows kexec to work with SMP enabled. > Without this, machine_shutdown() simply puts all CPUs into a loop. If > the code of that loop is over-written, the CPUs may hang or crash (which > I do observe in practice), or cause the kexec'd kernel not to be able to > initialize them. > > This fix has the added benefit that the kexec always happens on the boot > CPU, and thus kexec mirrors the initial kernel boot as much as possible. > > Signed-off-by: Stephen Warren <swarren@nvidia.com> > --- > Russell, I assume this should go into the ARM patch tracker if OK? I guess the discussions about getting this into the generic parts of the kernel came to nothing? So, yes. > > arch/arm/kernel/process.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index f79dd1e..1893bda 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -239,6 +239,7 @@ __setup("reboot=", reboot_setup); > > void machine_shutdown(void) > { > + disable_nonboot_cpus(); > #ifdef CONFIG_SMP > smp_send_stop(); > #endif > -- > 1.7.10.4 >
On 01/02/13 13:07, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > On Tegra at least, this change allows kexec to work with SMP enabled. > Without this, machine_shutdown() simply puts all CPUs into a loop. If > the code of that loop is over-written, the CPUs may hang or crash (which > I do observe in practice), or cause the kexec'd kernel not to be able to > initialize them. > > This fix has the added benefit that the kexec always happens on the boot > CPU, and thus kexec mirrors the initial kernel boot as much as possible. > > Signed-off-by: Stephen Warren <swarren@nvidia.com> > --- > Russell, I assume this should go into the ARM patch tracker if OK? > > arch/arm/kernel/process.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > index f79dd1e..1893bda 100644 > --- a/arch/arm/kernel/process.c > +++ b/arch/arm/kernel/process.c > @@ -239,6 +239,7 @@ __setup("reboot=", reboot_setup); > > void machine_shutdown(void) > { > + disable_nonboot_cpus(); > #ifdef CONFIG_SMP > smp_send_stop(); > #endif How does this work in a CONFIG_SUSPEND=n build? It looks like disable_nonboot_cpus() would be a no-op and so we wouldn't actually hot unplug the other CPUs before sending the smp_send_stop(). And then the smp_send_stop() seems a little unnecessary if we do actually shutdown the other CPUs.
On Wed, Jan 02, 2013 at 11:59:02PM +0000, Stephen Boyd wrote: > On 01/02/13 13:07, Stephen Warren wrote: > > From: Stephen Warren <swarren@nvidia.com> > > > > On Tegra at least, this change allows kexec to work with SMP enabled. > > Without this, machine_shutdown() simply puts all CPUs into a loop. If > > the code of that loop is over-written, the CPUs may hang or crash (which > > I do observe in practice), or cause the kexec'd kernel not to be able to > > initialize them. > > > > This fix has the added benefit that the kexec always happens on the boot > > CPU, and thus kexec mirrors the initial kernel boot as much as possible. > > > > Signed-off-by: Stephen Warren <swarren@nvidia.com> > > --- > > Russell, I assume this should go into the ARM patch tracker if OK? > > > > arch/arm/kernel/process.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c > > index f79dd1e..1893bda 100644 > > --- a/arch/arm/kernel/process.c > > +++ b/arch/arm/kernel/process.c > > @@ -239,6 +239,7 @@ __setup("reboot=", reboot_setup); > > > > void machine_shutdown(void) > > { > > + disable_nonboot_cpus(); > > #ifdef CONFIG_SMP > > smp_send_stop(); > > #endif > > How does this work in a CONFIG_SUSPEND=n build? It looks like > disable_nonboot_cpus() would be a no-op and so we wouldn't actually hot > unplug the other CPUs before sending the smp_send_stop(). And then the > smp_send_stop() seems a little unnecessary if we do actually shutdown > the other CPUs. You need the smp_send_stop call in order to send the cpu_kill (looks like tegra needs die and then kill). So you really need hotplug support as well as suspend for this to do much (if not, the secondaries end up spinning with interrupts disabled which is probably the best we can do anyway). We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG there) if you like? Will
On Wed, Jan 02, 2013 at 09:07:41PM +0000, Stephen Warren wrote: > From: Stephen Warren <swarren@nvidia.com> > > On Tegra at least, this change allows kexec to work with SMP enabled. > Without this, machine_shutdown() simply puts all CPUs into a loop. If > the code of that loop is over-written, the CPUs may hang or crash (which > I do observe in practice), or cause the kexec'd kernel not to be able to > initialize them. > > This fix has the added benefit that the kexec always happens on the boot > CPU, and thus kexec mirrors the initial kernel boot as much as possible. > > Signed-off-by: Stephen Warren <swarren@nvidia.com> Acked-by: Will Deacon <will.deacon@arm.com> Cheers Stephen, Will
On Thu, Jan 03, 2013 at 12:02:59PM +0000, Will Deacon wrote: > You need the smp_send_stop call in order to send the cpu_kill (looks like > tegra needs die and then kill). So you really need hotplug support as well > as suspend for this to do much (if not, the secondaries end up spinning > with interrupts disabled which is probably the best we can do anyway). > > We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG > there) if you like? Or we could look into bringing in the code to do this when KEXEC is enabled - which would mean an amount of restructuring of the Kconfig files. In other words, go from this: config SUSPEND bool "Suspend to RAM and standby" depends on ARCH_SUSPEND_POSSIBLE default y config HIBERNATE_CALLBACKS bool config PM_SLEEP def_bool y depends on SUSPEND || HIBERNATE_CALLBACKS config PM_SLEEP_SMP def_bool y depends on SMP depends on ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE depends on PM_SLEEP select HOTPLUG select HOTPLUG_CPU to: config SUSPEND bool... depends on ARCH_SUSPEND_POSSIBLE default y select PM_SLEEP config HIBERNATE_CALLBACKS bool select PM_SLEEP config PM_SLEEP bool select PM_SLEEP_SMP if SMP && (ARCH_SUSPEND_POSSIBLE || ARCH_HIBERNATION_POSSIBLE) config PM_SLEEP_SMP bool select HOTPLUG select HOTPLUG_CPU This means we could have KEXEC select PM_SLEEP_SMP (and maybe PM_SLEEP) as required, rather than bringing in the entire suspend support just to have working kexec. However, how many platforms don't have suspend support enabled?
On Thu, Jan 03, 2013 at 12:21:00PM +0000, Russell King - ARM Linux wrote: > This means we could have KEXEC select PM_SLEEP_SMP (and maybe PM_SLEEP) as > required, rather than bringing in the entire suspend support just to have > working kexec. > > However, how many platforms don't have suspend support enabled? I was looking into something similar to this for Kirkwood - my interest was in getting kexec to work, which means it needs to call the hibernation callbacks in the drivers prior to execing, otherwise there is a fairly high chance DMA will corrupt the kexec target. So this means the code needs the suspend support, however Kirkwood doesn't have any support for ARM_CPU_SUSPEND, which blocks enabling suspend at all for that platform. I was looking at moving the 'depends on CPU_..' from ARCH_SUSPEND_POSSIBLE to ARM_CPU_SUSPEND, which seemed to do the trick for kirkwood but I didn't study every possibility.. Regards, Jason
On 01/03/2013 05:21 AM, Russell King - ARM Linux wrote: > On Thu, Jan 03, 2013 at 12:02:59PM +0000, Will Deacon wrote: >> You need the smp_send_stop call in order to send the cpu_kill (looks like >> tegra needs die and then kill). So you really need hotplug support as well >> as suspend for this to do much (if not, the secondaries end up spinning >> with interrupts disabled which is probably the best we can do anyway). >> >> We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG >> there) if you like? > > Or we could look into bringing in the code to do this when KEXEC is > enabled - which would mean an amount of restructuring of the Kconfig > files. I'm not sure if any of this thread means we should hold off on this patch, or just that the Kconfig could/should be enhanced later? One thing I did just notice with my patch: disable_nonboot_cpus() ends up being called twice for the poweroff path: [ 30.461847] Disabling non-boot CPUs ... [ 30.478797] CPU1: shutdown [ 30.492104] Power down. [ 30.494578] Disabling non-boot CPUs ... Is this worth worrying about? It's because kernel_power_off() calls disable_nonboot_cpus() then machine_shutdown() which also calls disable_nonboot_cpus().
On Thu, Jan 03, 2013 at 08:26:15PM +0000, Stephen Warren wrote: > On 01/03/2013 05:21 AM, Russell King - ARM Linux wrote: > > On Thu, Jan 03, 2013 at 12:02:59PM +0000, Will Deacon wrote: > >> You need the smp_send_stop call in order to send the cpu_kill (looks like > >> tegra needs die and then kill). So you really need hotplug support as well > >> as suspend for this to do much (if not, the secondaries end up spinning > >> with interrupts disabled which is probably the best we can do anyway). > >> > >> We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG > >> there) if you like? > > > > Or we could look into bringing in the code to do this when KEXEC is > > enabled - which would mean an amount of restructuring of the Kconfig > > files. > > I'm not sure if any of this thread means we should hold off on this > patch, or just that the Kconfig could/should be enhanced later? Well, Russell's suggestion looked easy enough to have a crack out so you could always post a series implementing it along with this patch. > One thing I did just notice with my patch: disable_nonboot_cpus() ends > up being called twice for the poweroff path: > > [ 30.461847] Disabling non-boot CPUs ... > [ 30.478797] CPU1: shutdown > [ 30.492104] Power down. > [ 30.494578] Disabling non-boot CPUs ... > > Is this worth worrying about? It's harmless but it's also pretty horrible. Unfortunately, I don't see what we can do about it: it's a direct side-effect of generic code calling disable_nonboot_cpus for poweroff and not for kexec. Will
On Sun, Jan 06, 2013 at 04:22:00PM +0000, Will Deacon wrote: > On Thu, Jan 03, 2013 at 08:26:15PM +0000, Stephen Warren wrote: > > On 01/03/2013 05:21 AM, Russell King - ARM Linux wrote: > > > On Thu, Jan 03, 2013 at 12:02:59PM +0000, Will Deacon wrote: > > >> You need the smp_send_stop call in order to send the cpu_kill (looks like > > >> tegra needs die and then kill). So you really need hotplug support as well > > >> as suspend for this to do much (if not, the secondaries end up spinning > > >> with interrupts disabled which is probably the best we can do anyway). > > >> > > >> We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG > > >> there) if you like? > > > > > > Or we could look into bringing in the code to do this when KEXEC is > > > enabled - which would mean an amount of restructuring of the Kconfig > > > files. > > > > I'm not sure if any of this thread means we should hold off on this > > patch, or just that the Kconfig could/should be enhanced later? > > Well, Russell's suggestion looked easy enough to have a crack out so you > could always post a series implementing it along with this patch. > > > One thing I did just notice with my patch: disable_nonboot_cpus() ends > > up being called twice for the poweroff path: > > > > [ 30.461847] Disabling non-boot CPUs ... > > [ 30.478797] CPU1: shutdown > > [ 30.492104] Power down. > > [ 30.494578] Disabling non-boot CPUs ... > > > > Is this worth worrying about? > > It's harmless but it's also pretty horrible. Unfortunately, I don't see > what we can do about it: it's a direct side-effect of generic code calling > disable_nonboot_cpus for poweroff and not for kexec. I think we're starting to hit the age old problem with software: over time it becomes more difficult to change established software as it becomes more risky to make changes, and no one wants to make those changes through fear of breakign something else. Eventually, this results in someone declaring that the current project is beyond hope, and the next Linus Torvalds comes along and the next Linux kernel project will be born. That probably isn't want people want, but it's what will eventually happen when things just get too painful. Unless, of course, we start pushing back now and don't leave issues to fester. However, that said, I can see why kexec does this. I've not looked too deeply into kexec, but I'd image that the path where the secondary CPUs are not taken offline prior to kexec happening is to allow kexec to be used for crash recovery, where you don't want to do too much in case you re-tickle the bug that caused the original crash. I suspect what platforms like x86 do in this scenario (via machine_shutdown) is to forcefully put the secondary CPUs into reset and hold them there until the new kexec'd kernel gets going. The problem is... on ARM we're yet again shot in the foot through the unwillingness to architect certain aspects of the system: there is no architecturally known way to place CPUs into a reset state once they've been started up. In other words, once a CPU starts executing kernel code, it needs to always execute kernel code or there must be some kind of platform specific hook to disable it. Maybe that's the answer here: have machine_shutdown() call out to some platform specific function to do the forced-takedown of secondary CPUs, and if there's no such specific function, then we use our present CPU stopping method of making them spin in a WFI loop with IRQs disabled.
Russell King - ARM Linux <linux@arm.linux.org.uk> writes: > On Sun, Jan 06, 2013 at 04:22:00PM +0000, Will Deacon wrote: >> On Thu, Jan 03, 2013 at 08:26:15PM +0000, Stephen Warren wrote: >> > On 01/03/2013 05:21 AM, Russell King - ARM Linux wrote: >> > > On Thu, Jan 03, 2013 at 12:02:59PM +0000, Will Deacon wrote: >> > >> You need the smp_send_stop call in order to send the cpu_kill (looks like >> > >> tegra needs die and then kill). So you really need hotplug support as well >> > >> as suspend for this to do much (if not, the secondaries end up spinning >> > >> with interrupts disabled which is probably the best we can do anyway). >> > >> >> > >> We could add SUSPEND as a KEXEC dependency if SMP (we already have HOTPLUG >> > >> there) if you like? >> > > >> > > Or we could look into bringing in the code to do this when KEXEC is >> > > enabled - which would mean an amount of restructuring of the Kconfig >> > > files. >> > >> > I'm not sure if any of this thread means we should hold off on this >> > patch, or just that the Kconfig could/should be enhanced later? >> >> Well, Russell's suggestion looked easy enough to have a crack out so you >> could always post a series implementing it along with this patch. >> >> > One thing I did just notice with my patch: disable_nonboot_cpus() ends >> > up being called twice for the poweroff path: >> > >> > [ 30.461847] Disabling non-boot CPUs ... >> > [ 30.478797] CPU1: shutdown >> > [ 30.492104] Power down. >> > [ 30.494578] Disabling non-boot CPUs ... >> > >> > Is this worth worrying about? >> >> It's harmless but it's also pretty horrible. Unfortunately, I don't see >> what we can do about it: it's a direct side-effect of generic code calling >> disable_nonboot_cpus for poweroff and not for kexec. > > I think we're starting to hit the age old problem with software: over > time it becomes more difficult to change established software as it > becomes more risky to make changes, and no one wants to make those > changes through fear of breakign something else. > > Eventually, this results in someone declaring that the current project > is beyond hope, and the next Linus Torvalds comes along and the next > Linux kernel project will be born. That probably isn't want people want, > but it's what will eventually happen when things just get too painful. > > Unless, of course, we start pushing back now and don't leave issues to > fester. > > However, that said, I can see why kexec does this. I've not looked too > deeply into kexec, but I'd image that the path where the secondary CPUs > are not taken offline prior to kexec happening is to allow kexec to > be used for crash recovery, where you don't want to do too much in case > you re-tickle the bug that caused the original crash. > > I suspect what platforms like x86 do in this scenario (via > machine_shutdown) is to forcefully put the secondary CPUs into reset > and hold them there until the new kexec'd kernel gets going. > > The problem is... on ARM we're yet again shot in the foot through the > unwillingness to architect certain aspects of the system: there is no > architecturally known way to place CPUs into a reset state once they've > been started up. In other words, once a CPU starts executing kernel > code, it needs to always execute kernel code or there must be some kind > of platform specific hook to disable it. > > Maybe that's the answer here: have machine_shutdown() call out to some > platform specific function to do the forced-takedown of secondary CPUs, > and if there's no such specific function, then we use our present CPU > stopping method of making them spin in a WFI loop with IRQs disabled. Yes. On x86 we have had the generic equivalent of disable_nonboot_cpus in the machine_shutdown for a long time. Now I don't know the full history of what led someone to make a conscious choice to not call disable_nonboot_cpus() in kernel_kexec() before machine_shutdown() and machine_kexec(). But it was a deliberate choice and someone needs to dig up the reasons and review if the reasons are still valid before we can change the generic code path without introducing regressions. I remember there being some problems when disable_nonboot_cpus() was added originially. And if disable_nonboot_cpus disappears in configurations without power managment that is another problem, because platforms that have SMP always need that logic to happen. But what it looks like is whoever put disable_nonboot_cpus() on the kernel reboot path didn't do their homework, did a sloppy job and left us with a situation where we need to write both generic code and arch specific code to do the same thing. I have cleaned up the mess that is the reboot path once a bunch of years ago, and apparently it is deteriorating again. Unfortunately right now I don't have the time or inclination to sort out whatever the nonsense disable_nonboot_cpus() has become. What is required of machine_shutdown() is clear even if there is duplication with disable_nonboot_cpus(). Please note the only code path that generically calls machine_shutdown() is kernel_kexec() so if you wish you can avoid duplication by refactoring you architecture specific code. Eric
Hi Eric, On Mon, Jan 07, 2013 at 01:53:30AM +0000, Eric W. Biederman wrote: > Russell King - ARM Linux <linux@arm.linux.org.uk> writes: > > The problem is... on ARM we're yet again shot in the foot through the > > unwillingness to architect certain aspects of the system: there is no > > architecturally known way to place CPUs into a reset state once they've > > been started up. In other words, once a CPU starts executing kernel > > code, it needs to always execute kernel code or there must be some kind > > of platform specific hook to disable it. > > > > Maybe that's the answer here: have machine_shutdown() call out to some > > platform specific function to do the forced-takedown of secondary CPUs, > > and if there's no such specific function, then we use our present CPU > > stopping method of making them spin in a WFI loop with IRQs disabled. > > Yes. On x86 we have had the generic equivalent of disable_nonboot_cpus > in the machine_shutdown for a long time. Ok, we can do that that on ARM too, but we currently call machine_shutdown from machine_{power_off, halt, restart} so we just need to fix that and we will be doing functionally the same thing as x86. > Please note the only code path that generically calls machine_shutdown() > is kernel_kexec() so if you wish you can avoid duplication by > refactoring you architecture specific code. Exactly, that sounds like the right way forward to me. Thanks. Will
On Sun, Jan 06, 2013 at 05:53:30PM -0800, Eric W. Biederman wrote: > I have cleaned up the mess that is the reboot path once a bunch of years > ago, and apparently it is deteriorating again. Unfortunately, that's what happens when lots of cooks get their fingers in a pie with no coordination. This is why having maintainers is soo important for code - a good maintainer ensures that the code remains high quality by whatever means. Code without maintainers is subject to modification in all kinds of random ways, including duplicating code amongst architectures which should be generic code - that happens because no one wants to understand what other architectures require. Your original cleanups, afaik, are still all intact. The problem is that since then, kexec has come along, and invented a new callback (machine_shutdown) which gets called while the system is still live in full SMP mode (and presumably all the devices are still running and potentially scribbling over memory with their DMA.) Meanwhile kexec wants to pass control to the new kernel... that doesn't sound particularly clever to me.
On 01/06/2013 06:53 PM, Eric W. Biederman wrote: ... > Yes. On x86 we have had the generic equivalent of disable_nonboot_cpus > in the machine_shutdown for a long time. It looks like the x86 implementation does a bit more than disable_nonboot_cpus(): disable_nonboot_cpus(): find first cpu in online cpu mask disable everything else x86's machine_shutdown(): default to rebooting on cpu 0 if user specified a different cpu, override default bring that cpu online disable everything else So, x86 always kexec's on a specific CPU, whereas if we use disable_nonboot_cpus() on ARM, we'll end up kexec'ing on whichever is the first online CPU, which might not be the actual boot CPU, and can vary. On Tegra this doesn't (currently?) matter since CPU 0 can't be taken offline due to our CPU hotplug driver disallowing it. But, perhaps other SoCs or future Tegra versions don't/won't have this restriction, so the difference will be material. Should the x86 code be lifted into the implementation of disable_nonboot_cpus()? For the record, here's what I can tell about what the various arch-specific machine_shutdown() do: ARM, ARM64: calls smp_send_stop() -> disable_nonboot_cpus() would be correct IA64: shuts down all CPUs except the current one -> disable_nonboot_cpus() would be correct Microblaze: nothing (but no SMP support?) -> disable_nonboot_cpus() would be irrelevant, but fine MIPS: machine-specific: Cavium Octeon: Shuts down CPUs, waits until num_online_cpus()==1 Not sure which CPU isn't shut down though; the current one? Others: Nothing (but at least some have SMP support) -> disable_nonboot_cpus() would be a behaviour change PPC: machine-specific Only implementations either aren't for SMP, or do nothing (but presumably many have SMP support) -> disable_nonboot_cpus() would be a behaviour change SH: smp_send_stop() -> disable_nonboot_cpus() would be correct S390: nothing (but appears to have SMP support) -> disable_nonboot_cpus() would be a behaviour change Tile: nothing (but bans kexec unless no SMP or only 1 CPU online) -> disable_nonboot_cpus() would be irrelevant, but fine, and perhaps even allow removal of the kexec ban for SMP? So at least I don't see anything that would particularly indicate that having the kexec generic/core code call disable_nonboot_cpus() would cause problems; many architectures have done something like that themselves. That said, it certainly would cause some behavioural differences on some big platforms like PPC...
Russell King - ARM Linux <linux@arm.linux.org.uk> writes: > On Sun, Jan 06, 2013 at 05:53:30PM -0800, Eric W. Biederman wrote: >> I have cleaned up the mess that is the reboot path once a bunch of years >> ago, and apparently it is deteriorating again. > > Unfortunately, that's what happens when lots of cooks get their fingers > in a pie with no coordination. This is why having maintainers is soo > important for code - a good maintainer ensures that the code remains > high quality by whatever means. > > Code without maintainers is subject to modification in all kinds of > random ways, including duplicating code amongst architectures which > should be generic code - that happens because no one wants to understand > what other architectures require. I don't think you meant to but I have been rather strongly insulted. With what time I can find I have been maintaining kexec. Which I admit has mostly been telling people don't do that silly stupid thing. > Your original cleanups, afaik, are still all intact. The problem is > that since then, kexec has come along, and invented a new callback > (machine_shutdown) which gets called while the system is still live > in full SMP mode (and presumably all the devices are still running > and potentially scribbling over memory with their DMA.) The devices should not be running. The sequence of code is. kernel_restart_prepare(); machine_shutdown(); machine_kexec(); The reboot notifier and all of the device shutdown methods have been called in kernel_restart_prepare(). If drivers don't stop dma in their shutdown method that is a driver bug. I wish we could use the remove method but I lost that fight a decade ago. It is exactly the same sequence we use for a normal reboot except for the the lack of disable_nonboot_cpus(). > Meanwhile > kexec wants to pass control to the new kernel... that doesn't sound > particularly clever to me. Because I was annoyed I just stopped and between faded memories and reading the kernel logs I think I can reconstruct how disable_nonboot_cpus() got into the places it has gotten. Ages and ages ago (the logs say Oct 2007) disable_nonboot_cpus() stopped being a pure cpu-hotplug and power management thing and started being called from kernel_power_off. I objected to it's inclusion on the reboot path but there was some deep magic ACPI ordering mess that necessitated it be called in kernel_power_off. In retrospect that disable_nonboot_cpus() call should have been x86 specific. As I recall disable_nonboot_cpus() at the time was a cpu hotplug hack and reading through the code again tonight it appears that disable_noboot_cpus() continues to be a messed up cpu hotplug hack. disable_nonboot_cpus() should really be called sometimes_dangerously_hotunplug_all_but_one_cpu(). If that code is going to be something other than power management specific it is not cool that disable_nonboot_cpus() is not always enabled when SMP is enabled. It means that architectures need to implement two different ways of shutting down cpus. One of the truly nasty things about cpu_hotplug is that it requires that irqs be migrated away from a cpu with interrupts disabled, which at least on x86 in some interrupt delivery modes is impossible to do safely. The only way to losslessly (and without wedging irq controllers) in those interrupt delivery modes (needed for more than 8 cpus) is to migrate an irq in it's irq handler. Which is fine for setting /proc/irq/$N/smp_affinity but is useless for cpu hot-unplug, where we need to guarantee that all irqs are going to stop hitting a cpu. Now sometimes_dangerously_hotunplug_all_but_one_cpu() on the reboot path was added just a few months ago in Oct 2012, and it appears to due to weird ARM maintainership. At least the x86 reboot_cpu_id option is broken due to that addition. The unlikely but very real ways to wedge your cpu during a reboot still remain in those code paths and I expect we simply have not seen regressions yet as there has not been enough time for the regressions to be noticed and tracked down. Now given that the inconsistences were caused by crazy ARM maintainership and that disable_nonboot_cpus() is still crazy enough to make me scared thinking about the code. We should remove disable_nonboot_cpus() from the reboot path. It is still a crazy unmaintained cpu hotplug mess. Move disable_nonboot_cpus() into machine_shutdown in the ARM reboot path if you want to rely on that crazy piece of code. Or possibly you can write something ARM specific that requires less work and is safer. Eric
Thanks for the cross architecture survey. Stephen Warren <swarren@wwwdotorg.org> writes: > On 01/06/2013 06:53 PM, Eric W. Biederman wrote: > ... >> Yes. On x86 we have had the generic equivalent of disable_nonboot_cpus >> in the machine_shutdown for a long time. > > It looks like the x86 implementation does a bit more than > disable_nonboot_cpus(): > > disable_nonboot_cpus(): > find first cpu in online cpu mask > disable everything else > > x86's machine_shutdown(): > default to rebooting on cpu 0 > if user specified a different cpu, override default > bring that cpu online > disable everything else That you can no longer specify a reboot_cpu_id on x86 is a regression since Oct caused by people fixing another piece of the ARM SMP reboot. Not that I know it is knew I wonder what other x86 regressions it causes. The migrating of irqs to different cpus with interrupts disabled on interrupt the architectural x86 interrupt controllers that don't support that and that I have experimentally verified can be locked on with no recovery short of power cycling gives me the willies. The actual reboot_cpu_id option was someones first hack at getting the reboot to happen on the boot cpu so it may not matter much as long as we reboot on the boot cpu. reboot_cpu_id really wasn't a kexec thing in this case. > So, x86 always kexec's on a specific CPU, whereas if we use > disable_nonboot_cpus() on ARM, we'll end up kexec'ing on whichever is > the first online CPU, which might not be the actual boot CPU, and can > vary. > On Tegra this doesn't (currently?) matter since CPU 0 can't be taken > offline due to our CPU hotplug driver disallowing it. But, perhaps other > SoCs or future Tegra versions don't/won't have this restriction, so the > difference will be material. > Should the x86 code be lifted into the implementation of > disable_nonboot_cpus()? disable_nonboot_cpus() roughly as it exists today is needed for hibernation and some silliness like that. I don't have any problem with generic code in the reboot path doing: if (cpu_online(0)) set_cpus_allowed_ptr(current, cpumask_of(0)); All of the code in native_stop_other_cpus() on x86 is both simple and pretty architecture specific so I don't see much if any point in making a generic version. At most there is an argument for an arch specific stop_other_cpus() call. I have a real hard time believe that power management and cpu hot-unplug need to be as complex as disable_nonboot_cpus() suggest. For justifying disable_nonboot_cpus() there is the question of how many architectures is it safe to migrate irqs from one cpu to another with irqs disabled. > For the record, here's what I can tell about what the various > arch-specific machine_shutdown() do: > > ARM, ARM64: calls smp_send_stop() > -> disable_nonboot_cpus() would be correct > > IA64: shuts down all CPUs except the current one > -> disable_nonboot_cpus() would be correct > > Microblaze: nothing (but no SMP support?) > -> disable_nonboot_cpus() would be irrelevant, but fine > > MIPS: machine-specific: > Cavium Octeon: Shuts down CPUs, waits until num_online_cpus()==1 > Not sure which CPU isn't shut down though; the current one? > Others: Nothing (but at least some have SMP support) > -> disable_nonboot_cpus() would be a behaviour change > > PPC: machine-specific > Only implementations either aren't for SMP, or do nothing (but > presumably many have SMP support) > -> disable_nonboot_cpus() would be a behaviour change > > SH: smp_send_stop() > -> disable_nonboot_cpus() would be correct > > S390: nothing (but appears to have SMP support) > -> disable_nonboot_cpus() would be a behaviour change > > Tile: nothing (but bans kexec unless no SMP or only 1 CPU online) > -> disable_nonboot_cpus() would be irrelevant, but fine, and perhaps > even allow removal of the kexec ban for SMP? > > So at least I don't see anything that would particularly indicate that > having the kexec generic/core code call disable_nonboot_cpus() would > cause problems; many architectures have done something like that > themselves. That said, it certainly would cause some behavioural > differences on some big platforms like PPC... Behavior differences someone introduced just a kernel ago in the PPC reboot code to deal with ARMs lack of code in machine_shutdown to handle this at all. Eric
On Thu, Jan 10, 2013 at 09:59:35PM -0800, Eric W. Biederman wrote: > Russell King - ARM Linux <linux@arm.linux.org.uk> writes: > > > On Sun, Jan 06, 2013 at 05:53:30PM -0800, Eric W. Biederman wrote: > >> I have cleaned up the mess that is the reboot path once a bunch of years > >> ago, and apparently it is deteriorating again. > > > > Unfortunately, that's what happens when lots of cooks get their fingers > > in a pie with no coordination. This is why having maintainers is soo > > important for code - a good maintainer ensures that the code remains > > high quality by whatever means. > > > > Code without maintainers is subject to modification in all kinds of > > random ways, including duplicating code amongst architectures which > > should be generic code - that happens because no one wants to understand > > what other architectures require. > > I don't think you meant to but I have been rather strongly insulted. Why would you be insulted by the above? It's a statement of fact that is true for anything. Ever heard of the expression "Too many cooks spoil the broth" ? Does that expression insult you too? You've admitted that you don't have time to look at the shutdown/reboot paths very much, so they are by way of that fact lacking in maintainer input - but if you read the rest of what I wrote, I also commented that they were still in pretty good shape inspite of that. So actually, there's a complement there. Shame you missed it. > Now sometimes_dangerously_hotunplug_all_but_one_cpu() on the reboot path > was added just a few months ago in Oct 2012, and it appears to due to > weird ARM maintainership. Err. No. Stop trying to accuse people of being crap when there's no evidence: commit f96972f2dc6365421cf2366ebd61ee4cf060c8d5 Author: Shawn Guo <shawn.guo@linaro.org> Date: Thu Oct 4 17:12:23 2012 -0700 kernel/sys.c: call disable_nonboot_cpus() in kernel_restart() ... Signed-off-by: Shawn Guo <shawn.guo@linaro.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> No ARM maintainers were involved in that change. I just checked my mailboxes for September 2012 which is when it was mailed to the list. It got missed, and Andrew replied and accepted the patch (as is illustrated by the sign-offs). And even if _I_ had been involved with it, _I_ would not have spotted the x86 behaviour, just the same that Andrew didn't. Is this not itself evidence of my comment above: had the above change gone through someone who had a deep understanding of that code - iow, a maintainer, it probably would not have been committed to the kernel? To accuse others who have nothing to do with the area, and weren't involved with the change in any way of being responsible for that change is... well... I don't think there's anything that can be said to that which isn't inflamatory. It seems that, given your comments, the above commit needs to be reverted, and then _maybe_ the disable_nonboot_cpus() call should be added to ARMs machine_shutdown() instead.
On 01/10/2013 10:59 PM, Eric W. Biederman wrote: ... > disable_nonboot_cpus() should really be called > sometimes_dangerously_hotunplug_all_but_one_cpu(). > > If that code is going to be something other than power management > specific it is not cool that disable_nonboot_cpus() is not always > enabled when SMP is enabled. It means that architectures need to > implement two different ways of shutting down cpus. > > One of the truly nasty things about cpu_hotplug is that it requires that > irqs be migrated away from a cpu with interrupts disabled, which at > least on x86 in some interrupt delivery modes is impossible to do > safely. The only way to losslessly (and without wedging irq > controllers) in those interrupt delivery modes (needed for more than 8 > cpus) is to migrate an irq in it's irq handler. Which is fine for > setting /proc/irq/$N/smp_affinity but is useless for cpu hot-unplug, > where we need to guarantee that all irqs are going to stop hitting a > cpu. I'm a little confused about this; disable_nonboot_cpus() calls _cpu_down() to hot-unplug the CPU, and the regular CPU hotplug path is drivers/base/cpu.c:store_online() -> cpu_down() -> _cpu_down(), i.e. the same basic code. Given that, why is CPU hot unplug safe on x86 via sysfs (well, I'm just assuming it must be, since it's enabled in my regular distro kernel and appears to work fine), but not safe when it's done from disable_nonboot_cpus()? > Now sometimes_dangerously_hotunplug_all_but_one_cpu() on the reboot path > was added just a few months ago in Oct 2012, and it appears to due to > weird ARM maintainership. At least the x86 reboot_cpu_id option is > broken due to that addition. I guess this is true because reboot_cpu_id() isn't honored unless that CPU is already offline, and disable_nonboot_cpus() probably took it offline on average. Perhaps x86's native_machine_shutdown() should explicitly bring that CPU online if it's in cpu_possible_mask? Or perhaps disable_nonboot_cpus() should be enhanced with x86's reboot_cpu_id logic, so it simply works the same way across all architectures; that way some chunk of x86's native_machine_shutdown() could be made common. ... > We should remove disable_nonboot_cpus() from the reboot path. It is > still a crazy unmaintained cpu hotplug mess. Even if we updated it to standardize the reboot_cpu_id logic across all architectures, and hence made it still suitable for x86?
On 01/10/2013 11:28 PM, Eric W. Biederman wrote: ... > I don't have any problem with generic code in the reboot path > doing: > if (cpu_online(0)) > set_cpus_allowed_ptr(current, cpumask_of(0)); It looks like that API just affects the scheduler, and not whether the other CPUs are actually active/hot-plugged-in. At least for my use-case, I need something that really disables the other CPUs so they aren't executing code, hence my tendency to hot-un-plug them using disable_nonboot_cpus(), rather than just shift task execution off them using the code above. I wonder if all architectures shouldn't always do the following in all reboot/shutdown/kexec cases: * set_cpus_allowed_ptr() to limit code execution to a single CPU. * disable_nonboot_cpus() (or equivalent) if it's available to turn off all the other CPUs. The issue here would be that disable_nonboot_cpus() isn't always available; I assume that's part of the reason that there are arch-specific machine_xxx() hooks, so that architectures can power-off/reset their CPUs even when hotplug isn't enabled? I wonder if that can be refactored so that reboot/poweroff/kexec can share some CPU-disable code with CPU hotplug?
diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c index f79dd1e..1893bda 100644 --- a/arch/arm/kernel/process.c +++ b/arch/arm/kernel/process.c @@ -239,6 +239,7 @@ __setup("reboot=", reboot_setup); void machine_shutdown(void) { + disable_nonboot_cpus(); #ifdef CONFIG_SMP smp_send_stop(); #endif