diff mbox

ARM: kexec: validate CPU hotplug support

Message ID 1373582931-11956-1-git-send-email-swarren@wwwdotorg.org (mailing list archive)
State New, archived
Headers show

Commit Message

Stephen Warren July 11, 2013, 10:48 p.m. UTC
From: Stephen Warren <swarren@nvidia.com>

Architectures should fully validate whether kexec is possible as part of
machine_kexec_prepare(), so that user-space's kexec_load() operation can
report any problems. Performing validation in machine_kexec() itself is
too late, since it is not allowed to return.

Prior to this patch, ARM's machine_kexec() was testing after-the-fact
whether machine_kexec_prepare() was able to disable all but one CPU.
Instead, modify machine_kexec_prepare() to validate all conditions
necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
succeeded, yet disabling the CPUs didn't actually work.

Signed-off-by: Stephen Warren <swarren@nvidia.com>
---
Russell, does it make sense for this to be cc: stable as a follow-up to
19ab428 "ARM: 7759/1: decouple CPU offlining from reboot/shutdown"?

 arch/arm/include/asm/smp_plat.h |  3 +++
 arch/arm/kernel/machine_kexec.c | 20 ++++++++++++++++----
 arch/arm/kernel/smp.c           |  8 ++++++++
 3 files changed, 27 insertions(+), 4 deletions(-)

Comments

Stephen Warren July 19, 2013, 3:18 p.m. UTC | #1
On 07/11/2013 04:48 PM, Stephen Warren wrote:
> From: Stephen Warren <swarren@nvidia.com>
> 
> Architectures should fully validate whether kexec is possible as part of
> machine_kexec_prepare(), so that user-space's kexec_load() operation can
> report any problems. Performing validation in machine_kexec() itself is
> too late, since it is not allowed to return.
> 
> Prior to this patch, ARM's machine_kexec() was testing after-the-fact
> whether machine_kexec_prepare() was able to disable all but one CPU.
> Instead, modify machine_kexec_prepare() to validate all conditions
> necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
> succeeded, yet disabling the CPUs didn't actually work.

Russell, does this look good to put into the ARM patch tracker?
Eric W. Biederman July 21, 2013, 1:40 a.m. UTC | #2
Stephen Warren <swarren@wwwdotorg.org> writes:

> From: Stephen Warren <swarren@nvidia.com>
>
> Architectures should fully validate whether kexec is possible as part of
> machine_kexec_prepare(), so that user-space's kexec_load() operation can
> report any problems. Performing validation in machine_kexec() itself is
> too late, since it is not allowed to return.
>
> Prior to this patch, ARM's machine_kexec() was testing after-the-fact
> whether machine_kexec_prepare() was able to disable all but one CPU.
> Instead, modify machine_kexec_prepare() to validate all conditions
> necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
> succeeded, yet disabling the CPUs didn't actually work.
>
> Signed-off-by: Stephen Warren <swarren@nvidia.com>

At a quick skim this looks good to me.

Acked-by: "Eric W. Biederman" <ebiederm@xmission.com>

> ---
> Russell, does it make sense for this to be cc: stable as a follow-up to
> 19ab428 "ARM: 7759/1: decouple CPU offlining from reboot/shutdown"?
>
>  arch/arm/include/asm/smp_plat.h |  3 +++
>  arch/arm/kernel/machine_kexec.c | 20 ++++++++++++++++----
>  arch/arm/kernel/smp.c           |  8 ++++++++
>  3 files changed, 27 insertions(+), 4 deletions(-)
>
> diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
> index 6462a72..a252c0b 100644
> --- a/arch/arm/include/asm/smp_plat.h
> +++ b/arch/arm/include/asm/smp_plat.h
> @@ -88,4 +88,7 @@ static inline u32 mpidr_hash_size(void)
>  {
>  	return 1 << mpidr_hash.bits;
>  }
> +
> +extern int platform_can_cpu_hotplug(void);
> +
>  #endif
> diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
> index 4fb074c..d7c82df 100644
> --- a/arch/arm/kernel/machine_kexec.c
> +++ b/arch/arm/kernel/machine_kexec.c
> @@ -15,6 +15,7 @@
>  #include <asm/mmu_context.h>
>  #include <asm/cacheflush.h>
>  #include <asm/mach-types.h>
> +#include <asm/smp_plat.h>
>  #include <asm/system_misc.h>
>  
>  extern const unsigned char relocate_new_kernel[];
> @@ -39,6 +40,14 @@ int machine_kexec_prepare(struct kimage *image)
>  	int i, err;
>  
>  	/*
> +	 * Validate that if the current HW supports SMP, then the SW supports
> +	 * and implements CPU hotplug for the current HW. If not, we won't be
> +	 * able to kexec reliably, so fail the prepare operation.
> +	 */
> +	if (num_possible_cpus() > 1 && !platform_can_cpu_hotplug())
> +		return -EINVAL;
> +
> +	/*
>  	 * No segment at default ATAGs address. try to locate
>  	 * a dtb using magic.
>  	 */
> @@ -134,10 +143,13 @@ void machine_kexec(struct kimage *image)
>  	unsigned long reboot_code_buffer_phys;
>  	void *reboot_code_buffer;
>  
> -	if (num_online_cpus() > 1) {
> -		pr_err("kexec: error: multiple CPUs still online\n");
> -		return;
> -	}
> +	/*
> +	 * This can only happen if machine_shutdown() failed to disable some
> +	 * CPU, and that can only happen if the checks in
> +	 * machine_kexec_prepare() were not correct. If this fails, we can't
> +	 * reliably kexec anyway, so BUG_ON is appropriate.
> +	 */
> +	BUG_ON(num_online_cpus() > 1);
>  
>  	page_list = image->head & PAGE_MASK;
>  
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index c2b4f8f..5b9e501 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -145,6 +145,14 @@ int boot_secondary(unsigned int cpu, struct task_struct *idle)
>  	return -ENOSYS;
>  }
>  
> +int platform_can_cpu_hotplug(void)
> +{
> +	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || !smp_ops.cpu_kill)
> +		return 0;
> +
> +	return 1;
> +}
> +
>  #ifdef CONFIG_HOTPLUG_CPU
>  static void percpu_timer_stop(void);
Stephen Warren July 30, 2013, 9:40 p.m. UTC | #3
On 07/19/2013 09:18 AM, Stephen Warren wrote:
> On 07/11/2013 04:48 PM, Stephen Warren wrote:
>> From: Stephen Warren <swarren@nvidia.com>
>>
>> Architectures should fully validate whether kexec is possible as part of
>> machine_kexec_prepare(), so that user-space's kexec_load() operation can
>> report any problems. Performing validation in machine_kexec() itself is
>> too late, since it is not allowed to return.
>>
>> Prior to this patch, ARM's machine_kexec() was testing after-the-fact
>> whether machine_kexec_prepare() was able to disable all but one CPU.
>> Instead, modify machine_kexec_prepare() to validate all conditions
>> necessary for machine_kexec_prepare()'s to succeed. BUG if the validation
>> succeeded, yet disabling the CPUs didn't actually work.
> 
> Russell, does this look good to put into the ARM patch tracker?

I put this in the patch tracker since I assume that no response means no
objection.
Russell King - ARM Linux Aug. 1, 2013, 1:41 p.m. UTC | #4
On Thu, Jul 11, 2013 at 04:48:51PM -0600, Stephen Warren wrote:
> +int platform_can_cpu_hotplug(void)
> +{
> +	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || !smp_ops.cpu_kill)
> +		return 0;

This is an inappropriate usage of IS_ENABLED().  When hotplug CPU is
disabled, there is no cpu_kill member in smp_ops, so this leads to
build failure.

Dropping your patch.
diff mbox

Patch

diff --git a/arch/arm/include/asm/smp_plat.h b/arch/arm/include/asm/smp_plat.h
index 6462a72..a252c0b 100644
--- a/arch/arm/include/asm/smp_plat.h
+++ b/arch/arm/include/asm/smp_plat.h
@@ -88,4 +88,7 @@  static inline u32 mpidr_hash_size(void)
 {
 	return 1 << mpidr_hash.bits;
 }
+
+extern int platform_can_cpu_hotplug(void);
+
 #endif
diff --git a/arch/arm/kernel/machine_kexec.c b/arch/arm/kernel/machine_kexec.c
index 4fb074c..d7c82df 100644
--- a/arch/arm/kernel/machine_kexec.c
+++ b/arch/arm/kernel/machine_kexec.c
@@ -15,6 +15,7 @@ 
 #include <asm/mmu_context.h>
 #include <asm/cacheflush.h>
 #include <asm/mach-types.h>
+#include <asm/smp_plat.h>
 #include <asm/system_misc.h>
 
 extern const unsigned char relocate_new_kernel[];
@@ -39,6 +40,14 @@  int machine_kexec_prepare(struct kimage *image)
 	int i, err;
 
 	/*
+	 * Validate that if the current HW supports SMP, then the SW supports
+	 * and implements CPU hotplug for the current HW. If not, we won't be
+	 * able to kexec reliably, so fail the prepare operation.
+	 */
+	if (num_possible_cpus() > 1 && !platform_can_cpu_hotplug())
+		return -EINVAL;
+
+	/*
 	 * No segment at default ATAGs address. try to locate
 	 * a dtb using magic.
 	 */
@@ -134,10 +143,13 @@  void machine_kexec(struct kimage *image)
 	unsigned long reboot_code_buffer_phys;
 	void *reboot_code_buffer;
 
-	if (num_online_cpus() > 1) {
-		pr_err("kexec: error: multiple CPUs still online\n");
-		return;
-	}
+	/*
+	 * This can only happen if machine_shutdown() failed to disable some
+	 * CPU, and that can only happen if the checks in
+	 * machine_kexec_prepare() were not correct. If this fails, we can't
+	 * reliably kexec anyway, so BUG_ON is appropriate.
+	 */
+	BUG_ON(num_online_cpus() > 1);
 
 	page_list = image->head & PAGE_MASK;
 
diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
index c2b4f8f..5b9e501 100644
--- a/arch/arm/kernel/smp.c
+++ b/arch/arm/kernel/smp.c
@@ -145,6 +145,14 @@  int boot_secondary(unsigned int cpu, struct task_struct *idle)
 	return -ENOSYS;
 }
 
+int platform_can_cpu_hotplug(void)
+{
+	if (!IS_ENABLED(CONFIG_HOTPLUG_CPU) || !smp_ops.cpu_kill)
+		return 0;
+
+	return 1;
+}
+
 #ifdef CONFIG_HOTPLUG_CPU
 static void percpu_timer_stop(void);