diff mbox series

x86/power: Fix 'nosmt' vs. hibernation triple fault during resume

Message ID nycvar.YFH.7.76.1905282326360.1962@cbobk.fhfr.pm (mailing list archive)
State Superseded, archived
Headers show
Series x86/power: Fix 'nosmt' vs. hibernation triple fault during resume | expand

Commit Message

Jiri Kosina May 28, 2019, 9:31 p.m. UTC
From: Jiri Kosina <jkosina@suse.cz>

As explained in

	0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")

we always, no matter what, have to bring up x86 HT siblings during boot at 
least once in order to avoid first MCE bringing the system to its knees.

That means that whenever 'nosmt' is supplied on the kernel command-line, 
all the HT siblings are as a result sitting in mwait or cpudile after 
going through the online-offline cycle at least once.

This causes a serious issue though when a kernel, which saw 'nosmt' on its 
commandline, is going to perform resume from hibernation: if the resume 
from the hibernated image is successful, cr3 is flipped in order to point 
to the address space of the kernel that is being resumed, which in turn 
means that all the HT siblings are all of a sudden mwaiting on address 
which is no longer valid.

That results in triple fault shortly after cr3 is switched, and machine 
reboots.

Fix this by always waking up all the SMT siblings before initiating the 
'restore from hibernation' process; this guarantees that all the HT 
siblings will be properly carried over to the resumed kernel waiting in 
resume_play_dead(), and acted upon accordingly afterwards, based on the 
target kernel configuration.

Cc: stable@vger.kernel.org # v4.19+
Debugged-by: Thomas Gleixner <tglx@linutronix.de>
Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
---
 arch/x86/power/cpu.c | 11 +++++++++++
 include/linux/cpu.h  |  2 ++
 kernel/cpu.c         |  2 +-
 3 files changed, 14 insertions(+), 1 deletion(-)

Comments

Rafael J. Wysocki May 29, 2019, 8:06 a.m. UTC | #1
On Tue, May 28, 2019 at 11:31 PM Jiri Kosina <jikos@kernel.org> wrote:
>
> From: Jiri Kosina <jkosina@suse.cz>
>
> As explained in
>
>         0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
>
> we always, no matter what, have to bring up x86 HT siblings during boot at
> least once in order to avoid first MCE bringing the system to its knees.
>
> That means that whenever 'nosmt' is supplied on the kernel command-line,
> all the HT siblings are as a result sitting in mwait or cpudile after
> going through the online-offline cycle at least once.
>
> This causes a serious issue though when a kernel, which saw 'nosmt' on its
> commandline, is going to perform resume from hibernation: if the resume
> from the hibernated image is successful, cr3 is flipped in order to point
> to the address space of the kernel that is being resumed, which in turn
> means that all the HT siblings are all of a sudden mwaiting on address
> which is no longer valid.
>
> That results in triple fault shortly after cr3 is switched, and machine
> reboots.
>
> Fix this by always waking up all the SMT siblings before initiating the
> 'restore from hibernation' process; this guarantees that all the HT
> siblings will be properly carried over to the resumed kernel waiting in
> resume_play_dead(), and acted upon accordingly afterwards, based on the
> target kernel configuration.
>
> Cc: stable@vger.kernel.org # v4.19+
> Debugged-by: Thomas Gleixner <tglx@linutronix.de>
> Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once")
> Signed-off-by: Jiri Kosina <jkosina@suse.cz>

I can take this or, in case it is better to route it through x86:

Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>

> ---
>  arch/x86/power/cpu.c | 11 +++++++++++
>  include/linux/cpu.h  |  2 ++
>  kernel/cpu.c         |  2 +-
>  3 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index a7d966964c6f..bde8ce1f6c6c 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -299,9 +299,20 @@ int hibernate_resume_nonboot_cpu_disable(void)
>          * address in its instruction pointer may not be possible to resolve
>          * any more at that point (the page tables used by it previously may
>          * have been overwritten by hibernate image data).
> +        *
> +        * First, make sure that we wake up all the potentially disabled SMT
> +        * threads which have been initially brought up and then put into
> +        * mwait/cpuidle sleep.
> +        * Those will be put to proper (not interfering with hibernation
> +        * resume) sleep afterwards, and the resumed kernel will decide itself
> +        * what to do with them.
>          */
>         smp_ops.play_dead = resume_play_dead;
> +       ret = cpuhp_smt_enable();
> +       if (ret)
> +               goto out;
>         ret = disable_nonboot_cpus();
> +out:
>         smp_ops.play_dead = play_dead;
>         return ret;
>  }
> diff --git a/include/linux/cpu.h b/include/linux/cpu.h
> index 3813fe45effd..b5523552a607 100644
> --- a/include/linux/cpu.h
> +++ b/include/linux/cpu.h
> @@ -201,10 +201,12 @@ enum cpuhp_smt_control {
>  extern enum cpuhp_smt_control cpu_smt_control;
>  extern void cpu_smt_disable(bool force);
>  extern void cpu_smt_check_topology(void);
> +extern int cpuhp_smt_enable(void);
>  #else
>  # define cpu_smt_control               (CPU_SMT_NOT_IMPLEMENTED)
>  static inline void cpu_smt_disable(bool force) { }
>  static inline void cpu_smt_check_topology(void) { }
> +static inline int cpuhp_smt_enable(void) { return 0; }
>  #endif
>
>  /*
> diff --git a/kernel/cpu.c b/kernel/cpu.c
> index f2ef10460698..3ff5ce0e4132 100644
> --- a/kernel/cpu.c
> +++ b/kernel/cpu.c
> @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
>         return ret;
>  }
>
> -static int cpuhp_smt_enable(void)
> +int cpuhp_smt_enable(void)
>  {
>         int cpu, ret = 0;
>
>
> --
> Jiri Kosina
> SUSE Labs
Peter Zijlstra May 29, 2019, 9:03 a.m. UTC | #2
On Tue, May 28, 2019 at 11:31:45PM +0200, Jiri Kosina wrote:

> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index a7d966964c6f..bde8ce1f6c6c 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -299,9 +299,20 @@ int hibernate_resume_nonboot_cpu_disable(void)
>  	 * address in its instruction pointer may not be possible to resolve
>  	 * any more at that point (the page tables used by it previously may
>  	 * have been overwritten by hibernate image data).
> +	 *
> +	 * First, make sure that we wake up all the potentially disabled SMT
> +	 * threads which have been initially brought up and then put into
> +	 * mwait/cpuidle sleep.
> +	 * Those will be put to proper (not interfering with hibernation
> +	 * resume) sleep afterwards, and the resumed kernel will decide itself
> +	 * what to do with them.
>  	 */
>  	smp_ops.play_dead = resume_play_dead;

Oooh, teh yuck!, but this explains my confusion from the other thread.

> +	ret = cpuhp_smt_enable();
> +	if (ret)
> +		goto out;
>  	ret = disable_nonboot_cpus();
> +out:
>  	smp_ops.play_dead = play_dead;
>  	return ret;
>  }

I think you can avoid the goto like:

	ret = cpuhp_smt_enable();
	if (ret)
		return ret;

	smp_ops.play_dead = resume_play_dead;
	ret = disable_nonboot_cpus();
	smp_ops.play_dead = play_dead;
	return ret;

We don't need the play dead change to online CPUs.
diff mbox series

Patch

diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index a7d966964c6f..bde8ce1f6c6c 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -299,9 +299,20 @@  int hibernate_resume_nonboot_cpu_disable(void)
 	 * address in its instruction pointer may not be possible to resolve
 	 * any more at that point (the page tables used by it previously may
 	 * have been overwritten by hibernate image data).
+	 *
+	 * First, make sure that we wake up all the potentially disabled SMT
+	 * threads which have been initially brought up and then put into
+	 * mwait/cpuidle sleep.
+	 * Those will be put to proper (not interfering with hibernation
+	 * resume) sleep afterwards, and the resumed kernel will decide itself
+	 * what to do with them.
 	 */
 	smp_ops.play_dead = resume_play_dead;
+	ret = cpuhp_smt_enable();
+	if (ret)
+		goto out;
 	ret = disable_nonboot_cpus();
+out:
 	smp_ops.play_dead = play_dead;
 	return ret;
 }
diff --git a/include/linux/cpu.h b/include/linux/cpu.h
index 3813fe45effd..b5523552a607 100644
--- a/include/linux/cpu.h
+++ b/include/linux/cpu.h
@@ -201,10 +201,12 @@  enum cpuhp_smt_control {
 extern enum cpuhp_smt_control cpu_smt_control;
 extern void cpu_smt_disable(bool force);
 extern void cpu_smt_check_topology(void);
+extern int cpuhp_smt_enable(void);
 #else
 # define cpu_smt_control		(CPU_SMT_NOT_IMPLEMENTED)
 static inline void cpu_smt_disable(bool force) { }
 static inline void cpu_smt_check_topology(void) { }
+static inline int cpuhp_smt_enable(void) { return 0; }
 #endif
 
 /*
diff --git a/kernel/cpu.c b/kernel/cpu.c
index f2ef10460698..3ff5ce0e4132 100644
--- a/kernel/cpu.c
+++ b/kernel/cpu.c
@@ -2093,7 +2093,7 @@  static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval)
 	return ret;
 }
 
-static int cpuhp_smt_enable(void)
+int cpuhp_smt_enable(void)
 {
 	int cpu, ret = 0;