Message ID | nycvar.YFH.7.76.1905292223500.1962@cbobk.fhfr.pm (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | [v3] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume | expand |
On Wed 2019-05-29 22:26:48, Jiri Kosina wrote: > From: Jiri Kosina <jkosina@suse.cz> > > As explained in > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > we always, no matter what, have to bring up x86 HT siblings during boot at > least once in order to avoid first MCE bringing the system to its > knees. > Cc: stable@vger.kernel.org # v4.19+ > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz> But I'm less sure if this is -stable material. Is reverting 0cc3cd21657be04cb0559fe8063f2130493f92cf in -stable an option? Pavel
On Wed, 29 May 2019, Pavel Machek wrote: > > As explained in > > > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > > > we always, no matter what, have to bring up x86 HT siblings during boot at > > least once in order to avoid first MCE bringing the system to its > > knees. > > > > Cc: stable@vger.kernel.org # v4.19+ > > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > Signed-off-by: Jiri Kosina <jkosina@suse.cz> > > Acked-by: Pavel Machek <pavel@ucw.cz> > > But I'm less sure if this is -stable material. Is reverting > 0cc3cd21657be04cb0559fe8063f2130493f92cf in -stable an option? Well, without that commit, first MCE to come kills nosmt system.
On Wed 2019-05-29 23:27:34, Jiri Kosina wrote: > On Wed, 29 May 2019, Pavel Machek wrote: > > > > As explained in > > > > > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > > > > > we always, no matter what, have to bring up x86 HT siblings during boot at > > > least once in order to avoid first MCE bringing the system to its > > > knees. > > > > > > > Cc: stable@vger.kernel.org # v4.19+ > > > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > > > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > > Signed-off-by: Jiri Kosina <jkosina@suse.cz> > > > > Acked-by: Pavel Machek <pavel@ucw.cz> > > > > But I'm less sure if this is -stable material. Is reverting > > 0cc3cd21657be04cb0559fe8063f2130493f92cf in -stable an option? > > Well, without that commit, first MCE to come kills nosmt system. ...which is same situation 4.9-stable and 4.14-stable lives with for a long long time, right? But you are right, reverting 0cc3cd21657b does not look like good option :-(. Pavel
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index a7d966964c6f..513ce09e9950 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable(void) * address in its instruction pointer may not be possible to resolve * any more at that point (the page tables used by it previously may * have been overwritten by hibernate image data). + * + * First, make sure that we wake up all the potentially disabled SMT + * threads which have been initially brought up and then put into + * mwait/cpuidle sleep. + * Those will be put to proper (not interfering with hibernation + * resume) sleep afterwards, and the resumed kernel will decide itself + * what to do with them. */ + ret = cpuhp_smt_enable(); + if (ret) + return ret; smp_ops.play_dead = resume_play_dead; ret = disable_nonboot_cpus(); smp_ops.play_dead = play_dead; diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c index 4845b8c7be7f..fc413717a45f 100644 --- a/arch/x86/power/hibernate.c +++ b/arch/x86/power/hibernate.c @@ -11,6 +11,7 @@ #include <linux/suspend.h> #include <linux/scatterlist.h> #include <linux/kdebug.h> +#include <linux/cpu.h> #include <crypto/hash.h> @@ -245,3 +246,35 @@ int relocate_restore_code(void) __flush_tlb_all(); return 0; } + +int arch_resume_nosmt(void) +{ + int ret = 0; + /* + * We reached this while coming out of hibernation. This means + * that SMT siblings are sleeping in hlt, as mwait is not safe + * against control transition during resume (see comment in + * hibernate_resume_nonboot_cpu_disable()). + * + * If the resumed kernel has SMT disabled, we have to take all the + * SMT siblings out of hlt, and offline them again so that they + * end up in mwait proper. + * + * Called with hotplug disabled. + */ + cpu_hotplug_enable(); + if (cpu_smt_control == CPU_SMT_DISABLED || + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { + enum cpuhp_smt_control old = cpu_smt_control; + + ret = cpuhp_smt_enable(); + if (ret) + goto out; + ret = cpuhp_smt_disable(old); + if (ret) + goto out; + } +out: + cpu_hotplug_disable(); + return ret; +} diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 3813fe45effd..fcb1386bb0d4 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -201,10 +201,14 @@ enum cpuhp_smt_control { extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); extern void cpu_smt_check_topology(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) static inline void cpu_smt_disable(bool force) { } static inline void cpu_smt_check_topology(void) { } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif /* diff --git a/kernel/cpu.c b/kernel/cpu.c index f2ef10460698..077fde6fb953 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2061,7 +2061,7 @@ static void cpuhp_online_cpu_device(unsigned int cpu) kobject_uevent(&dev->kobj, KOBJ_ONLINE); } -static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0; @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) return ret; } -static int cpuhp_smt_enable(void) +int cpuhp_smt_enable(void) { int cpu, ret = 0; diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index c8c272df7154..40dadbee7aa5 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -257,6 +257,10 @@ void swsusp_show_speed(ktime_t start, ktime_t stop, (kps % 1000) / 10); } +__weak int arch_resume_nosmt(void) +{ +} + /** * create_image - Create a hibernation image. * @platform_mode: Whether or not to use the platform driver. @@ -324,6 +328,10 @@ static int create_image(int platform_mode) Enable_cpus: suspend_enable_secondary_cpus(); + /* Allow architectures to do nosmt-specific post-resume dances */ + if (!in_suspend) + error = arch_resume_nosmt(); + Platform_finish: platform_finish(platform_mode);