diff mbox

[v3,1/2] ARM: tango: add HOTPLUG_CPU support

Message ID 577266AD.8000006@sigmadesigns.com (mailing list archive)
State New, archived
Headers show

Commit Message

Marc Gonzalez June 28, 2016, 11:59 a.m. UTC
cpu_die() and cpu_kill() are implemented in firmware.

Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
---
 arch/arm/mach-tango/platsmp.c | 30 ++++++++++++++++++++++++++++++
 arch/arm/mach-tango/smc.h     |  4 +++-
 2 files changed, 33 insertions(+), 1 deletion(-)

Comments

Robin Murphy June 28, 2016, 12:30 p.m. UTC | #1
On 28/06/16 12:59, Marc Gonzalez wrote:
> cpu_die() and cpu_kill() are implemented in firmware.
> 
> Signed-off-by: Marc Gonzalez <marc_gonzalez@sigmadesigns.com>
> ---
>  arch/arm/mach-tango/platsmp.c | 30 ++++++++++++++++++++++++++++++
>  arch/arm/mach-tango/smc.h     |  4 +++-
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mach-tango/platsmp.c b/arch/arm/mach-tango/platsmp.c
> index a21f55e000d2..a24b9b1d0b1a 100644
> --- a/arch/arm/mach-tango/platsmp.c
> +++ b/arch/arm/mach-tango/platsmp.c
> @@ -1,3 +1,4 @@
> +#include <linux/delay.h>
>  #include <linux/init.h>
>  #include <linux/smp.h>
>  #include "smc.h"
> @@ -9,8 +10,37 @@ static int tango_boot_secondary(unsigned int cpu, struct task_struct *idle)
>  	return 0;
>  }
>  
> +#ifdef CONFIG_HOTPLUG_CPU
> +/*
> + * cpu_kill() and cpu_die() run concurrently on different cores.
> + * Firmware will only "kill" a core once it has properly "died".
> + * Keep trying to kill a core until the operation succeeds, but
> + * sleep between tries to give the core time to die.
> + */
> +static int tango_cpu_kill(unsigned int cpu)
> +{
> +	do {
> +		msleep(10);
> +	} while (tango_aux_core_kill(cpu) != 0);

Does the firmware guarantee that this will succeed (or at least report
success) in finite time, regardless of how messed up the system might
be? I'd imagine this should probably have either a timeout or a comment
clarifying why it doesn't need a timeout.

Robin.

> +
> +	return 1;
> +}
> +
> +static void tango_cpu_die(unsigned int cpu)
> +{
> +	do {
> +		cpu_relax();
> +	} while (tango_aux_core_die(cpu) != 0);
> +}
> +#else
> +#define tango_cpu_kill	NULL
> +#define tango_cpu_die	NULL
> +#endif
> +
>  static const struct smp_operations tango_smp_ops __initconst = {
>  	.smp_boot_secondary	= tango_boot_secondary,
> +	.cpu_kill		= tango_cpu_kill,
> +	.cpu_die		= tango_cpu_die,
>  };
>  
>  CPU_METHOD_OF_DECLARE(tango4_smp, "sigma,tango4-smp", &tango_smp_ops);
> diff --git a/arch/arm/mach-tango/smc.h b/arch/arm/mach-tango/smc.h
> index 7a4af35cc390..3d31f984f44c 100644
> --- a/arch/arm/mach-tango/smc.h
> +++ b/arch/arm/mach-tango/smc.h
> @@ -2,4 +2,6 @@ extern int tango_smc(unsigned int val, unsigned int service);
>  
>  #define tango_set_l2_control(val)	tango_smc(val, 0x102)
>  #define tango_start_aux_core(val)	tango_smc(val, 0x104)
> -#define tango_set_aux_boot_addr(val)	tango_smc((unsigned int)val, 0x105)
> +#define tango_set_aux_boot_addr(val)	tango_smc(val, 0x105)
> +#define tango_aux_core_die(val)		tango_smc(val, 0x121)
> +#define tango_aux_core_kill(val)	tango_smc(val, 0x122)
>
Marc Gonzalez June 28, 2016, 3:04 p.m. UTC | #2
On 28/06/2016 14:30, Robin Murphy wrote:

> On 28/06/16 12:59, Marc Gonzalez wrote:
>
>> +#ifdef CONFIG_HOTPLUG_CPU
>> +/*
>> + * cpu_kill() and cpu_die() run concurrently on different cores.
>> + * Firmware will only "kill" a core once it has properly "died".
>> + * Keep trying to kill a core until the operation succeeds, but
>> + * sleep between tries to give the core time to die.
>> + */
>> +static int tango_cpu_kill(unsigned int cpu)
>> +{
>> +	do {
>> +		msleep(10);
>> +	} while (tango_aux_core_kill(cpu) != 0);
> 
> Does the firmware guarantee that this will succeed (or at least report
> success) in finite time, regardless of how messed up the system might
> be? I'd imagine this should probably have either a timeout or a comment
> clarifying why it doesn't need a timeout.

Good point.

The FW allows only one thread at a time. If a thread is wedged inside
the FW, no other thread can use the FW. In that situation, cpu0 would
remain stuck inside tango_cpu_kill().

Note, that if tango_cpu_kill() starts failing, then secondary cores
will remain "zombies". So the system is mostly hosed anyway...
Only cpu0 will be available.

Regards.
Robin Murphy June 28, 2016, 3:16 p.m. UTC | #3
On 28/06/16 16:04, Marc Gonzalez wrote:
> On 28/06/2016 14:30, Robin Murphy wrote:
> 
>> On 28/06/16 12:59, Marc Gonzalez wrote:
>>
>>> +#ifdef CONFIG_HOTPLUG_CPU
>>> +/*
>>> + * cpu_kill() and cpu_die() run concurrently on different cores.
>>> + * Firmware will only "kill" a core once it has properly "died".
>>> + * Keep trying to kill a core until the operation succeeds, but
>>> + * sleep between tries to give the core time to die.
>>> + */
>>> +static int tango_cpu_kill(unsigned int cpu)
>>> +{
>>> +	do {
>>> +		msleep(10);
>>> +	} while (tango_aux_core_kill(cpu) != 0);
>>
>> Does the firmware guarantee that this will succeed (or at least report
>> success) in finite time, regardless of how messed up the system might
>> be? I'd imagine this should probably have either a timeout or a comment
>> clarifying why it doesn't need a timeout.
> 
> Good point.
> 
> The FW allows only one thread at a time. If a thread is wedged inside
> the FW, no other thread can use the FW. In that situation, cpu0 would
> remain stuck inside tango_cpu_kill().
> 
> Note, that if tango_cpu_kill() starts failing, then secondary cores
> will remain "zombies". So the system is mostly hosed anyway...
> Only cpu0 will be available.

Indeed; my thought was that if CPU1 somehow ends up wedged such that
tango_aux_core_die() never completes, then CPU0 eventually timing out
and being able to limp through a clean(ish) reboot is probably
preferable to spinning in cpu_kill() forever.

Robin.

> 
> Regards.
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>
Marc Gonzalez June 29, 2016, 12:28 p.m. UTC | #4
On 28/06/2016 17:16, Robin Murphy wrote:

> On 28/06/16 16:04, Marc Gonzalez wrote:
>
>> On 28/06/2016 14:30, Robin Murphy wrote:
>>
>>> Does the firmware guarantee that this will succeed (or at least report
>>> success) in finite time, regardless of how messed up the system might
>>> be? I'd imagine this should probably have either a timeout or a comment
>>> clarifying why it doesn't need a timeout.
>>
>> Good point.
>>
>> The FW allows only one thread at a time. If a thread is wedged inside
>> the FW, no other thread can use the FW. In that situation, cpu0 would
>> remain stuck inside tango_cpu_kill().
>>
>> Note, that if tango_cpu_kill() starts failing, then secondary cores
>> will remain "zombies". So the system is mostly hosed anyway...
>> Only cpu0 will be available.
> 
> Indeed; my thought was that if CPU1 somehow ends up wedged such that
> tango_aux_core_die() never completes, then CPU0 eventually timing out
> and being able to limp through a clean(ish) reboot is probably
> preferable to spinning in cpu_kill() forever.

I have sent an updated patch addressing your comment.
Thanks for flagging the issue.

Regards.
diff mbox

Patch

diff --git a/arch/arm/mach-tango/platsmp.c b/arch/arm/mach-tango/platsmp.c
index a21f55e000d2..a24b9b1d0b1a 100644
--- a/arch/arm/mach-tango/platsmp.c
+++ b/arch/arm/mach-tango/platsmp.c
@@ -1,3 +1,4 @@ 
+#include <linux/delay.h>
 #include <linux/init.h>
 #include <linux/smp.h>
 #include "smc.h"
@@ -9,8 +10,37 @@  static int tango_boot_secondary(unsigned int cpu, struct task_struct *idle)
 	return 0;
 }
 
+#ifdef CONFIG_HOTPLUG_CPU
+/*
+ * cpu_kill() and cpu_die() run concurrently on different cores.
+ * Firmware will only "kill" a core once it has properly "died".
+ * Keep trying to kill a core until the operation succeeds, but
+ * sleep between tries to give the core time to die.
+ */
+static int tango_cpu_kill(unsigned int cpu)
+{
+	do {
+		msleep(10);
+	} while (tango_aux_core_kill(cpu) != 0);
+
+	return 1;
+}
+
+static void tango_cpu_die(unsigned int cpu)
+{
+	do {
+		cpu_relax();
+	} while (tango_aux_core_die(cpu) != 0);
+}
+#else
+#define tango_cpu_kill	NULL
+#define tango_cpu_die	NULL
+#endif
+
 static const struct smp_operations tango_smp_ops __initconst = {
 	.smp_boot_secondary	= tango_boot_secondary,
+	.cpu_kill		= tango_cpu_kill,
+	.cpu_die		= tango_cpu_die,
 };
 
 CPU_METHOD_OF_DECLARE(tango4_smp, "sigma,tango4-smp", &tango_smp_ops);
diff --git a/arch/arm/mach-tango/smc.h b/arch/arm/mach-tango/smc.h
index 7a4af35cc390..3d31f984f44c 100644
--- a/arch/arm/mach-tango/smc.h
+++ b/arch/arm/mach-tango/smc.h
@@ -2,4 +2,6 @@  extern int tango_smc(unsigned int val, unsigned int service);
 
 #define tango_set_l2_control(val)	tango_smc(val, 0x102)
 #define tango_start_aux_core(val)	tango_smc(val, 0x104)
-#define tango_set_aux_boot_addr(val)	tango_smc((unsigned int)val, 0x105)
+#define tango_set_aux_boot_addr(val)	tango_smc(val, 0x105)
+#define tango_aux_core_die(val)		tango_smc(val, 0x121)
+#define tango_aux_core_kill(val)	tango_smc(val, 0x122)