Message ID | nycvar.YFH.7.76.1905300007470.1962@cbobk.fhfr.pm (mailing list archive) |
---|---|
State | Accepted, archived |
Delegated to: | Rafael Wysocki |
Headers | show |
Series | [v4] x86/power: Fix 'nosmt' vs. hibernation triple fault during resume | expand |
On Thu, May 30, 2019 at 12:09 AM Jiri Kosina <jikos@kernel.org> wrote: > > From: Jiri Kosina <jkosina@suse.cz> > > As explained in > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > we always, no matter what, have to bring up x86 HT siblings during boot at > least once in order to avoid first MCE bringing the system to its knees. > > That means that whenever 'nosmt' is supplied on the kernel command-line, > all the HT siblings are as a result sitting in mwait or cpudile after > going through the online-offline cycle at least once. > > This causes a serious issue though when a kernel, which saw 'nosmt' on its > commandline, is going to perform resume from hibernation: if the resume > from the hibernated image is successful, cr3 is flipped in order to point > to the address space of the kernel that is being resumed, which in turn > means that all the HT siblings are all of a sudden mwaiting on address > which is no longer valid. > > That results in triple fault shortly after cr3 is switched, and machine > reboots. > > Fix this by always waking up all the SMT siblings before initiating the > 'restore from hibernation' process; this guarantees that all the HT > siblings will be properly carried over to the resumed kernel waiting in > resume_play_dead(), and acted upon accordingly afterwards, based on the > target kernel configuration. > Symmetricaly, the resumed kernel has to push the SMT siblings to mwait > again in case it has SMT disabled; this means it has to online all > the siblings when resuming (so that they come out of hlt) and offline > them again to let them reach mwait. > > Cc: stable@vger.kernel.org # v4.19+ > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > Signed-off-by: Jiri Kosina <jkosina@suse.cz> LGTM And I would prefer this one to go in through the PM tree due to the hibernate core changes, so can I get an ACK from the x86 arch side here, please? > --- > > v1 -> v2: > - restructure error handling as suggested by peterz > - add Rafael's ack > > v2 -> v3: > - added extra online/offline dance for nosmt case during > resume, as we want the siblings to be in mwait, not hlt > - dropped peterz's and Rafael's acks for now due to the above > > v3 -> v4: > - fix undefined return value from arch_resume_nosmt() in case > it's not overriden by arch > > arch/x86/power/cpu.c | 10 ++++++++++ > arch/x86/power/hibernate.c | 33 +++++++++++++++++++++++++++++++++ > include/linux/cpu.h | 4 ++++ > kernel/cpu.c | 4 ++-- > kernel/power/hibernate.c | 9 +++++++++ > 5 files changed, 58 insertions(+), 2 deletions(-) > > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c > index a7d966964c6f..513ce09e9950 100644 > --- a/arch/x86/power/cpu.c > +++ b/arch/x86/power/cpu.c > @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable(void) > * address in its instruction pointer may not be possible to resolve > * any more at that point (the page tables used by it previously may > * have been overwritten by hibernate image data). > + * > + * First, make sure that we wake up all the potentially disabled SMT > + * threads which have been initially brought up and then put into > + * mwait/cpuidle sleep. > + * Those will be put to proper (not interfering with hibernation > + * resume) sleep afterwards, and the resumed kernel will decide itself > + * what to do with them. > */ > + ret = cpuhp_smt_enable(); > + if (ret) > + return ret; > smp_ops.play_dead = resume_play_dead; > ret = disable_nonboot_cpus(); > smp_ops.play_dead = play_dead; > diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c > index 4845b8c7be7f..fc413717a45f 100644 > --- a/arch/x86/power/hibernate.c > +++ b/arch/x86/power/hibernate.c > @@ -11,6 +11,7 @@ > #include <linux/suspend.h> > #include <linux/scatterlist.h> > #include <linux/kdebug.h> > +#include <linux/cpu.h> > > #include <crypto/hash.h> > > @@ -245,3 +246,35 @@ int relocate_restore_code(void) > __flush_tlb_all(); > return 0; > } > + > +int arch_resume_nosmt(void) > +{ > + int ret = 0; > + /* > + * We reached this while coming out of hibernation. This means > + * that SMT siblings are sleeping in hlt, as mwait is not safe > + * against control transition during resume (see comment in > + * hibernate_resume_nonboot_cpu_disable()). > + * > + * If the resumed kernel has SMT disabled, we have to take all the > + * SMT siblings out of hlt, and offline them again so that they > + * end up in mwait proper. > + * > + * Called with hotplug disabled. > + */ > + cpu_hotplug_enable(); > + if (cpu_smt_control == CPU_SMT_DISABLED || > + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { > + enum cpuhp_smt_control old = cpu_smt_control; > + > + ret = cpuhp_smt_enable(); > + if (ret) > + goto out; > + ret = cpuhp_smt_disable(old); > + if (ret) > + goto out; > + } > +out: > + cpu_hotplug_disable(); > + return ret; > +} > diff --git a/include/linux/cpu.h b/include/linux/cpu.h > index 3813fe45effd..fcb1386bb0d4 100644 > --- a/include/linux/cpu.h > +++ b/include/linux/cpu.h > @@ -201,10 +201,14 @@ enum cpuhp_smt_control { > extern enum cpuhp_smt_control cpu_smt_control; > extern void cpu_smt_disable(bool force); > extern void cpu_smt_check_topology(void); > +extern int cpuhp_smt_enable(void); > +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); > #else > # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) > static inline void cpu_smt_disable(bool force) { } > static inline void cpu_smt_check_topology(void) { } > +static inline int cpuhp_smt_enable(void) { return 0; } > +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } > #endif > > /* > diff --git a/kernel/cpu.c b/kernel/cpu.c > index f2ef10460698..077fde6fb953 100644 > --- a/kernel/cpu.c > +++ b/kernel/cpu.c > @@ -2061,7 +2061,7 @@ static void cpuhp_online_cpu_device(unsigned int cpu) > kobject_uevent(&dev->kobj, KOBJ_ONLINE); > } > > -static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > { > int cpu, ret = 0; > > @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) > return ret; > } > > -static int cpuhp_smt_enable(void) > +int cpuhp_smt_enable(void) > { > int cpu, ret = 0; > > diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c > index c8c272df7154..b65635753e8e 100644 > --- a/kernel/power/hibernate.c > +++ b/kernel/power/hibernate.c > @@ -257,6 +257,11 @@ void swsusp_show_speed(ktime_t start, ktime_t stop, > (kps % 1000) / 10); > } > > +__weak int arch_resume_nosmt(void) > +{ > + return 0; > +} > + > /** > * create_image - Create a hibernation image. > * @platform_mode: Whether or not to use the platform driver. > @@ -324,6 +329,10 @@ static int create_image(int platform_mode) > Enable_cpus: > suspend_enable_secondary_cpus(); > > + /* Allow architectures to do nosmt-specific post-resume dances */ > + if (!in_suspend) > + error = arch_resume_nosmt(); > + > Platform_finish: > platform_finish(platform_mode); > > > -- > Jiri Kosina > SUSE Labs >
On Thu 2019-05-30 00:09:39, Jiri Kosina wrote: > From: Jiri Kosina <jkosina@suse.cz> > > As explained in > > 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > we always, no matter what, have to bring up x86 HT siblings during boot at > least once in order to avoid first MCE bringing the system to its knees. > > That means that whenever 'nosmt' is supplied on the kernel command-line, > all the HT siblings are as a result sitting in mwait or cpudile after > going through the online-offline cycle at least once. > > This causes a serious issue though when a kernel, which saw 'nosmt' on its > commandline, is going to perform resume from hibernation: if the resume > from the hibernated image is successful, cr3 is flipped in order to point > to the address space of the kernel that is being resumed, which in turn > means that all the HT siblings are all of a sudden mwaiting on address > which is no longer valid. > > That results in triple fault shortly after cr3 is switched, and machine > reboots. > > Fix this by always waking up all the SMT siblings before initiating the > 'restore from hibernation' process; this guarantees that all the HT > siblings will be properly carried over to the resumed kernel waiting in > resume_play_dead(), and acted upon accordingly afterwards, based on the > target kernel configuration. > Symmetricaly, the resumed kernel has to push the SMT siblings to mwait > again in case it has SMT disabled; this means it has to online all > the siblings when resuming (so that they come out of hlt) and offline > them again to let them reach mwait. > > Cc: stable@vger.kernel.org # v4.19+ > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > Signed-off-by: Jiri Kosina <jkosina@suse.cz> Acked-by: Pavel Machek <pavel@ucw.cz>
On Thu, 30 May 2019, Rafael J. Wysocki wrote: > > > > Cc: stable@vger.kernel.org # v4.19+ > > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > Signed-off-by: Jiri Kosina <jkosina@suse.cz> > > LGTM > > And I would prefer this one to go in through the PM tree due to the > hibernate core changes, Ok. > so can I get an ACK from the x86 arch side here, please? No. Is the following good enough? Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Thanks, tglx
On Thu, May 30, 2019 at 11:27 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > On Thu, 30 May 2019, Rafael J. Wysocki wrote: > > > > > > Cc: stable@vger.kernel.org # v4.19+ > > > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > > > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > > Signed-off-by: Jiri Kosina <jkosina@suse.cz> > > > > LGTM > > > > And I would prefer this one to go in through the PM tree due to the > > hibernate core changes, > > Ok. > > > so can I get an ACK from the x86 arch side here, please? > > No. Is the following good enough? > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Yes, it is, thanks!
On Thu, May 30, 2019 at 11:38:51PM +0200, Rafael J. Wysocki wrote: > On Thu, May 30, 2019 at 11:27 PM Thomas Gleixner <tglx@linutronix.de> wrote: > > > > On Thu, 30 May 2019, Rafael J. Wysocki wrote: > > > > > > > > Cc: stable@vger.kernel.org # v4.19+ > > > > Debugged-by: Thomas Gleixner <tglx@linutronix.de> > > > > Fixes: 0cc3cd21657b ("cpu/hotplug: Boot HT siblings at least once") > > > > Signed-off-by: Jiri Kosina <jkosina@suse.cz> > > > > > > LGTM > > > > > > And I would prefer this one to go in through the PM tree due to the > > > hibernate core changes, > > > > Ok. > > > > > so can I get an ACK from the x86 arch side here, please? > > > > No. Is the following good enough? > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > Yes, it is, thanks! I still think changing monitor/mwait to use a fixmap address would be a much cleaner way to fix this. I can try to work up a patch tomorrow.
On Thu, 30 May 2019, Josh Poimboeuf wrote: > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > Yes, it is, thanks! > > I still think changing monitor/mwait to use a fixmap address would be a > much cleaner way to fix this. I can try to work up a patch tomorrow. I disagree with that from the backwards compatibility point of view. I personally am quite frequently using differnet combinations of resumer/resumee kernels, and I've never been biten by it so far. I'd guess I am not the only one. Fixmap sort of breaks that invariant. Thanks,
On Fri, May 31, 2019 at 01:42:02AM +0200, Jiri Kosina wrote: > On Thu, 30 May 2019, Josh Poimboeuf wrote: > > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > > > Yes, it is, thanks! > > > > I still think changing monitor/mwait to use a fixmap address would be a > > much cleaner way to fix this. I can try to work up a patch tomorrow. > > I disagree with that from the backwards compatibility point of view. > > I personally am quite frequently using differnet combinations of > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > I am not the only one. > Fixmap sort of breaks that invariant. Right now there is no backwards compatibility because nosmt resume is already broken. For "future" backwards compatibility we could just define a hard-coded reserved fixmap page address, adjacent to the vsyscall reserved address. Something like this (not yet tested)? Maybe we could also remove the resume_play_dead() hack? diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index 9da8cccdf3fb..1c328624162c 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -80,6 +80,7 @@ enum fixed_addresses { #ifdef CONFIG_X86_VSYSCALL_EMULATION VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT, #endif + FIX_MWAIT = (FIXADDR_TOP - VSYSCALL_ADDR - 1) >> PAGE_SHIFT, #endif FIX_DBGP_BASE, FIX_EARLYCON_MEM_BASE, diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c index 73e69aaaa117..9804fbe25d03 100644 --- a/arch/x86/kernel/smpboot.c +++ b/arch/x86/kernel/smpboot.c @@ -108,6 +108,8 @@ int __read_mostly __max_smt_threads = 1; /* Flag to indicate if a complete sched domain rebuild is required */ bool x86_topology_update; +static char __mwait_page[PAGE_SIZE]; + int arch_update_cpu_topology(void) { int retval = x86_topology_update; @@ -1319,6 +1321,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) smp_quirk_init_udelay(); speculative_store_bypass_ht_init(); + + set_fixmap(FIX_MWAIT, __pa_symbol(&__mwait_page)); } void arch_enable_nonboot_cpus_begin(void) @@ -1631,11 +1635,12 @@ static inline void mwait_play_dead(void) } /* - * This should be a memory location in a cache line which is - * unlikely to be touched by other processors. The actual - * content is immaterial as it is not actually modified in any way. + * This memory location is never actually written to. It's mapped at a + * reserved fixmap address to ensure the monitored address remains + * valid across a hibernation resume operation. Otherwise a triple + * fault can occur. */ - mwait_ptr = ¤t_thread_info()->flags; + mwait_ptr = (void *)fix_to_virt(FIX_MWAIT); wbinvd();
On Friday, May 31, 2019 7:14:56 AM CEST Josh Poimboeuf wrote: > On Fri, May 31, 2019 at 01:42:02AM +0200, Jiri Kosina wrote: > > On Thu, 30 May 2019, Josh Poimboeuf wrote: > > > > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > > > > > Yes, it is, thanks! > > > > > > I still think changing monitor/mwait to use a fixmap address would be a > > > much cleaner way to fix this. I can try to work up a patch tomorrow. > > > > I disagree with that from the backwards compatibility point of view. > > > > I personally am quite frequently using differnet combinations of > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > I am not the only one. > > Fixmap sort of breaks that invariant. > > Right now there is no backwards compatibility because nosmt resume is > already broken. > > For "future" backwards compatibility we could just define a hard-coded > reserved fixmap page address, adjacent to the vsyscall reserved address. > > Something like this (not yet tested)? Maybe we could also remove the > resume_play_dead() hack? Yes, we can IMO, but in a separate patch, please. > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index 9da8cccdf3fb..1c328624162c 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -80,6 +80,7 @@ enum fixed_addresses { > #ifdef CONFIG_X86_VSYSCALL_EMULATION > VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT, > #endif > + FIX_MWAIT = (FIXADDR_TOP - VSYSCALL_ADDR - 1) >> PAGE_SHIFT, > #endif > FIX_DBGP_BASE, > FIX_EARLYCON_MEM_BASE, > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 73e69aaaa117..9804fbe25d03 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -108,6 +108,8 @@ int __read_mostly __max_smt_threads = 1; > /* Flag to indicate if a complete sched domain rebuild is required */ > bool x86_topology_update; > > +static char __mwait_page[PAGE_SIZE]; > + > int arch_update_cpu_topology(void) > { > int retval = x86_topology_update; > @@ -1319,6 +1321,8 @@ void __init native_smp_prepare_cpus(unsigned int max_cpus) > smp_quirk_init_udelay(); > > speculative_store_bypass_ht_init(); > + > + set_fixmap(FIX_MWAIT, __pa_symbol(&__mwait_page)); > } > > void arch_enable_nonboot_cpus_begin(void) > @@ -1631,11 +1635,12 @@ static inline void mwait_play_dead(void) > } > > /* > - * This should be a memory location in a cache line which is > - * unlikely to be touched by other processors. The actual > - * content is immaterial as it is not actually modified in any way. > + * This memory location is never actually written to. It's mapped at a > + * reserved fixmap address to ensure the monitored address remains > + * valid across a hibernation resume operation. Otherwise a triple > + * fault can occur. > */ > - mwait_ptr = ¤t_thread_info()->flags; > + mwait_ptr = (void *)fix_to_virt(FIX_MWAIT); > > wbinvd(); > > Jiri, any chance to test this?
On Fri, 31 May 2019, Josh Poimboeuf wrote: > > I disagree with that from the backwards compatibility point of view. > > > > I personally am quite frequently using differnet combinations of > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > I am not the only one. > > Fixmap sort of breaks that invariant. > > Right now there is no backwards compatibility because nosmt resume is > already broken. Yeah, well, but that's "only" for nosmt kernels at least. > For "future" backwards compatibility we could just define a hard-coded > reserved fixmap page address, adjacent to the vsyscall reserved address. > > Something like this (not yet tested)? Maybe we could also remove the > resume_play_dead() hack? Does it also solve cpuidle case? I have no overview what all the cpuidle drivers might be potentially doing in their ->enter_dead() callbacks. Rafael? Thanks,
On Friday, May 31, 2019 10:47:21 AM CEST Jiri Kosina wrote: > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > I disagree with that from the backwards compatibility point of view. > > > > > > I personally am quite frequently using differnet combinations of > > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > > I am not the only one. > > > Fixmap sort of breaks that invariant. > > > > Right now there is no backwards compatibility because nosmt resume is > > already broken. > > Yeah, well, but that's "only" for nosmt kernels at least. > > > For "future" backwards compatibility we could just define a hard-coded > > reserved fixmap page address, adjacent to the vsyscall reserved address. > > > > Something like this (not yet tested)? Maybe we could also remove the > > resume_play_dead() hack? > > Does it also solve cpuidle case? I have no overview what all the cpuidle > drivers might be potentially doing in their ->enter_dead() callbacks. > Rafael? There are just two of them, ACPI cpuidle and intel_idle, and they both should be covered. In any case, I think that this is the way to go here even though it may be somewhat problematic to start with. Cheers, Rafael
On Fri, 31 May 2019, Josh Poimboeuf wrote: > Something like this (not yet tested)? Maybe we could also remove the > resume_play_dead() hack? I tried to test this, but the resumed kernel doesn't seem to be healthy for reason I don't understand yet. Symptoms I've seen so far -- 'dazed and confused NMI', spontaneous reboot, userspace segfault. > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index 9da8cccdf3fb..1c328624162c 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -80,6 +80,7 @@ enum fixed_addresses { > #ifdef CONFIG_X86_VSYSCALL_EMULATION > VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT, > #endif > + FIX_MWAIT = (FIXADDR_TOP - VSYSCALL_ADDR - 1) >> PAGE_SHIFT, Two things to this: - you don't seem to fix x86_32 - shouldn't it rather be FIXADDR_TOP - VSYSCALL_ADDR + 1 instead? Thanks,
On Fri 2019-05-31 01:42:02, Jiri Kosina wrote: > On Thu, 30 May 2019, Josh Poimboeuf wrote: > > > > > Reviewed-by: Thomas Gleixner <tglx@linutronix.de> > > > > > > Yes, it is, thanks! > > > > I still think changing monitor/mwait to use a fixmap address would be a > > much cleaner way to fix this. I can try to work up a patch tomorrow. > > I disagree with that from the backwards compatibility point of view. > > I personally am quite frequently using differnet combinations of > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > I am not the only one. > Fixmap sort of breaks that invariant. If we get less tricky code, it may be worth it... Pavel
On Fri, May 31, 2019 at 1:57 AM Rafael J. Wysocki <rjw@rjwysocki.net> wrote: > > On Friday, May 31, 2019 10:47:21 AM CEST Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > > I disagree with that from the backwards compatibility point of view. > > > > > > > > I personally am quite frequently using differnet combinations of > > > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > > > I am not the only one. > > > > Fixmap sort of breaks that invariant. > > > > > > Right now there is no backwards compatibility because nosmt resume is > > > already broken. > > > > Yeah, well, but that's "only" for nosmt kernels at least. > > > > > For "future" backwards compatibility we could just define a hard-coded > > > reserved fixmap page address, adjacent to the vsyscall reserved address. > > > > > > Something like this (not yet tested)? Maybe we could also remove the > > > resume_play_dead() hack? > > > > Does it also solve cpuidle case? I have no overview what all the cpuidle > > drivers might be potentially doing in their ->enter_dead() callbacks. > > Rafael? > > There are just two of them, ACPI cpuidle and intel_idle, and they both should > be covered. > > In any case, I think that this is the way to go here even though it may be somewhat > problematic to start with. > Given that there seems to be a genuine compatibility issue right now, can we design an actual sane way to hand off control of all CPUs rather than adding duct tape to an extremely fragile mechanism? I can think of at least two sensible solutions: 1. Have a self-contained "play dead for kexec/resume" function that touches only few well-defined physical pages: a set of page tables and a page of code. Load CR3 to point to those page tables, fill in the code with some form of infinite loop, and run it. Or just turn off paging entirely and run the infinite loop. Have the kernel doing the resuming inform the kernel being resumed of which pages these are, and have the kernel being resumed take over all CPUs before reusing the pages. 2. Put the CPU all the way to sleep by sending it an INIT IPI. Version 2 seems very simple and robust. Is there a reason we can't do it? We obviously don't want to do it for normal offline because it might be a high-power state, but a cpu in the wait-for-SIPI state is not going to exit that state all by itself. The patch to implement #2 should be short and sweet as long as we are careful to only put genuine APs to sleep like this. The only downside I can see is that an new kernel resuming and old kernel that was booted with nosmt is going to waste power, but I don't think that's a showstopper.
On Fri, 31 May 2019, Andy Lutomirski wrote: > 2. Put the CPU all the way to sleep by sending it an INIT IPI. > > Version 2 seems very simple and robust. Is there a reason we can't do > it? We obviously don't want to do it for normal offline because it > might be a high-power state, but a cpu in the wait-for-SIPI state is > not going to exit that state all by itself. > > The patch to implement #2 should be short and sweet as long as we are > careful to only put genuine APs to sleep like this. The only downside > I can see is that an new kernel resuming and old kernel that was > booted with nosmt is going to waste power, but I don't think that's a > showstopper. Well, if *that* is not an issue, than the original 3-liner that just forces them to 'hlt' [1] would be good enough as well. [1] https://lore.kernel.org/lkml/nycvar.YFH.7.76.1905291230130.1962@cbobk.fhfr.pm/ Thanks,
On Fri, 31 May 2019, Jiri Kosina wrote: > On Fri, 31 May 2019, Andy Lutomirski wrote: > > > 2. Put the CPU all the way to sleep by sending it an INIT IPI. > > > > Version 2 seems very simple and robust. Is there a reason we can't do > > it? We obviously don't want to do it for normal offline because it > > might be a high-power state, but a cpu in the wait-for-SIPI state is > > not going to exit that state all by itself. > > > > The patch to implement #2 should be short and sweet as long as we are > > careful to only put genuine APs to sleep like this. The only downside > > I can see is that an new kernel resuming and old kernel that was > > booted with nosmt is going to waste power, but I don't think that's a > > showstopper. > > Well, if *that* is not an issue, than the original 3-liner that just > forces them to 'hlt' [1] would be good enough as well. Actually no, scratch that, I misunderstood your proposal, sorry.
> On May 31, 2019, at 7:31 AM, Jiri Kosina <jikos@kernel.org> wrote: > >> On Fri, 31 May 2019, Andy Lutomirski wrote: >> >> 2. Put the CPU all the way to sleep by sending it an INIT IPI. >> >> Version 2 seems very simple and robust. Is there a reason we can't do >> it? We obviously don't want to do it for normal offline because it >> might be a high-power state, but a cpu in the wait-for-SIPI state is >> not going to exit that state all by itself. >> >> The patch to implement #2 should be short and sweet as long as we are >> careful to only put genuine APs to sleep like this. The only downside >> I can see is that an new kernel resuming and old kernel that was >> booted with nosmt is going to waste power, but I don't think that's a >> showstopper. > > Well, if *that* is not an issue, than the original 3-liner that just > forces them to 'hlt' [1] would be good enough as well. > > Seems okay to me as long as we’re confident we won’t get a spurious interrupt. In general, I don’t think we’re ever suppose to rely on mwait *staying* asleep. As I understand it, mwait can wake up whenever it wants, and the only real guarantee we have is that the CPU makes some effort to stay asleep until an interrupt is received or the monitor address is poked. As a trivial example, if we are in a VM and we get scheduled out at any point between MONITOR and the eventual intentional wakeup, we’re toast. Same if we get an SMI due to bad luck or due to a thermal event happening shortly after pushing the power button to resume from hibernate. For that matter, what actually happens if we get an SMI while halted? Does RSM go directly to sleep or does it re-fetch the HLT? It seems to me that we should just avoid the scenario where we have IP pointed to a bogus address and we just cross our fingers and hope the CPU doesn’t do anything. I think that, as a short term fix, we should use HLT and, as a long term fix, we should either keep the CPU state fully valid or we should hard-offline the CPU using documented mechanisms, e.g. the WAIT-for-SIPI state.
On Fri, 31 May 2019, Josh Poimboeuf wrote: > > I personally am quite frequently using differnet combinations of > > resumer/resumee kernels, and I've never been biten by it so far. I'd guess > > I am not the only one. > > Fixmap sort of breaks that invariant. > > Right now there is no backwards compatibility because nosmt resume is > already broken. > > For "future" backwards compatibility we could just define a hard-coded > reserved fixmap page address, adjacent to the vsyscall reserved address. > > Something like this (not yet tested)? Maybe we could also remove the > resume_play_dead() hack? Looking into SDM: ===== A store to the address range armed by the MONITOR instruction, an interrupt, an NMI or SMI, a debug exception, a machine check exception, the BINIT# signal, the INIT# signal, or the RESET# signal will exit the implementation-dependent-optimized state. ===== And mwait doesn't have the 'auto-restart on SMM exit' like hlt does. So I guess that's why I am seeing the triple faults even with your (fixed, see below) patch as well. So I don't think we can safely use this aproach. > > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index 9da8cccdf3fb..1c328624162c 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -80,6 +80,7 @@ enum fixed_addresses { > #ifdef CONFIG_X86_VSYSCALL_EMULATION > VSYSCALL_PAGE = (FIXADDR_TOP - VSYSCALL_ADDR) >> PAGE_SHIFT, > #endif > + FIX_MWAIT = (FIXADDR_TOP - VSYSCALL_ADDR - 1) >> PAGE_SHIFT, > #endif > FIX_DBGP_BASE, > FIX_EARLYCON_MEM_BASE, > diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c > index 73e69aaaa117..9804fbe25d03 100644 > --- a/arch/x86/kernel/smpboot.c > +++ b/arch/x86/kernel/smpboot.c > @@ -108,6 +108,8 @@ int __read_mostly __max_smt_threads = 1; > /* Flag to indicate if a complete sched domain rebuild is required */ > bool x86_topology_update; > > +static char __mwait_page[PAGE_SIZE]; This needs to be __align(PAGE_SIZE) in order for the fixmap to work properly.
On Fri, 31 May 2019, Andy Lutomirski wrote: > For that matter, what actually happens if we get an SMI while halted? > Does RSM go directly to sleep or does it re-fetch the HLT? Our mails just crossed, I replied to Josh's mwait() proposal patch a minute ago. HLT is guaranteed to be re-entered if SMM interrupted it, while MWAIT is not. So as a short-term fix for 5.2, I still believe in v4 of my patch that does the mwait->hlt->mwait transition across hibernate/resume, and for 5.3 I can look into forcing it to wait-for-SIPI proper.
On Fri, May 31, 2019 at 04:54:20PM +0200, Jiri Kosina wrote: > On Fri, 31 May 2019, Andy Lutomirski wrote: > > > For that matter, what actually happens if we get an SMI while halted? > > Does RSM go directly to sleep or does it re-fetch the HLT? > > Our mails just crossed, I replied to Josh's mwait() proposal patch a > minute ago. Good catch. I agree that mwait seems unsafe across resume and my patch is bogus. Andy, in the short term it sounds like you're proposing to make native_play_dead() use hlt_play_dead() unconditionally. Right? That would simplify things and also would fix Jiri's bug I think. The only question I'd have is if we have data on the power savings difference between hlt and mwait. mwait seems to wake up on a lot of different conditions which might negate its deeper sleep state. Andy, for your long term idea to use INIT IPI, I wonder if that would work with SMT siblings? Specifically I wonder about the Intel issue that requires siblings to have CR4.MCE set.
On Fri, 31 May 2019, Josh Poimboeuf wrote: > The only question I'd have is if we have data on the power savings > difference between hlt and mwait. mwait seems to wake up on a lot of > different conditions which might negate its deeper sleep state. hlt wakes up on basically the same set of events, but has the auto-restarting semantics on some of them (especially SMM). So the wakeup frequency itself shouldn't really contribute to power consumption difference; it's the C-state that mwait allows CPU to enter. Thanks,
On Fri, May 31, 2019 at 05:41:18PM +0200, Jiri Kosina wrote: > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > The only question I'd have is if we have data on the power savings > > difference between hlt and mwait. mwait seems to wake up on a lot of > > different conditions which might negate its deeper sleep state. > > hlt wakes up on basically the same set of events, but has the > auto-restarting semantics on some of them (especially SMM). So the wakeup > frequency itself shouldn't really contribute to power consumption > difference; it's the C-state that mwait allows CPU to enter. Ok. I reluctantly surrender :-) For your v4: Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> It works as a short term fix, but it's fragile, and it does feel like we're just adding more duct tape, as Andy said.
On Fri, May 31, 2019 at 7:54 AM Jiri Kosina <jikos@kernel.org> wrote: > > On Fri, 31 May 2019, Andy Lutomirski wrote: > > > For that matter, what actually happens if we get an SMI while halted? > > Does RSM go directly to sleep or does it re-fetch the HLT? > > Our mails just crossed, I replied to Josh's mwait() proposal patch a > minute ago. > > HLT is guaranteed to be re-entered if SMM interrupted it, while MWAIT is > not. > > So as a short-term fix for 5.2, I still believe in v4 of my patch that > does the mwait->hlt->mwait transition across hibernate/resume, and for 5.3 > I can look into forcing it to wait-for-SIPI proper. > How sure are you? From http://www.rcollins.org/ddj/Mar97/Mar97.html I see: In general, the AutoHALT field directs the microprocessor whether or not to restart the HLT instruction upon exit of SMM. This is accomplished by decrementing EIP and executing whatever instruction resides at that position. AutoHALT restart behavior is consistent, regardless of whether or not EIP-1 contains a HLT instruction. If the SMM handler set Auto HALT[bit0]=1 when the interrupted instruction was not a HLT instruction(AutoHALT[bit0]= 0 upon entrance), they would run the risk of resuming execution at an undesired location. The RSM microcode doesn’t know the length of the interrupted instruction. Therefore when AutoHALT[bit0]=1 upon exit, the RSM microcode blindly decrements the EIP register by 1 and resumes execution. This explains why Intel warns that unpredictable behavior may result from setting this field to restart a HLT instruction when the microprocessor wasn’t in a HALT state upon entrance. Listing One presents an algorithm that describes the AutoHALT Restart feature. The AMD APM says: If the return from SMM takes the processor back to the halt state, the HLT instruction is not re- executed. However, the halt special bus-cycle is driven on the processor bus after the RSM instruction executes. The Intel SDM Vol 3 34.10 says: If the HLT instruction is restarted, the processor will generate a memory access to fetch the HLT instruction (if it is not in the internal cache), and execute a HLT bus transaction. This behavior results in multiple HLT bus transactions for the same HLT instruction. And the SDM is very clear that one should *not* do RSM with auto-halt set if the instruction that was interrupted wasn't HLT. It sounds to me like Intel does not architecturally guarantee that it's okay to do HLT from one CPU and then remove the HLT instruction out from under it on another CPU.
On Fri, May 31, 2019 at 9:19 AM Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > On Fri, May 31, 2019 at 05:41:18PM +0200, Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > The only question I'd have is if we have data on the power savings > > > difference between hlt and mwait. mwait seems to wake up on a lot of > > > different conditions which might negate its deeper sleep state. > > > > hlt wakes up on basically the same set of events, but has the > > auto-restarting semantics on some of them (especially SMM). So the wakeup > > frequency itself shouldn't really contribute to power consumption > > difference; it's the C-state that mwait allows CPU to enter. > > Ok. I reluctantly surrender :-) For your v4: > > Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> > > It works as a short term fix, but it's fragile, and it does feel like > we're just adding more duct tape, as Andy said. > Just to clarify what I was thinking, it seems like soft-offlining a CPU and resuming a kernel have fundamentally different requirements. To soft-offline a CPU, we want to get power consumption as low as possible and make sure that MCE won't kill the system. It's okay for the CPU to occasionally execute some code. For resume, what we're really doing is trying to hand control of all CPUs from kernel A to kernel B. There are two basic ways to hand off control of a given CPU: we can jump (with JMP, RET, horrible self-modifying code, etc) from one kernel to the other, or we can attempt to make a given CPU stop executing code from either kernel at all and then forcibly wrench control of it in kernel B. Either approach seems okay, but the latter approach depends on getting the CPU to reliably stop executing code. We don't care about power consumption for resume, and I'm not even convinced that we need to be able to survive an MCE that happens while we're resuming, although surviving MCE would be nice. So if we don't want to depend on nasty system details at all, we could have the first kernel explicitly wake up all CPUs and hand them all off to the new kernel, more or less the same way that we hand over control of the BSP right now. Or we can look for a way to tell all the APs to stop executing kernel code, and the only architectural way I know of to do that is to sent an INIT IPI (and then presumably deassert INIT -- the SDM is a bit vague). Or we could allocate a page, stick a GDT, a TSS, and a 1: hlt; jmp 1b in it, turn off paging, and run that code. And then somehow convince the kernel we load not to touch that page until it finishes waking up all CPUs. This seems conceptually simple and very robust, but I'm not sure it fits in with the way hibernation works right now at all.
On Fri, May 31, 2019 at 09:51:09AM -0700, Andy Lutomirski wrote: > Just to clarify what I was thinking, it seems like soft-offlining a > CPU and resuming a kernel have fundamentally different requirements. > To soft-offline a CPU, we want to get power consumption as low as > possible and make sure that MCE won't kill the system. It's okay for > the CPU to occasionally execute some code. For resume, what we're > really doing is trying to hand control of all CPUs from kernel A to > kernel B. There are two basic ways to hand off control of a given > CPU: we can jump (with JMP, RET, horrible self-modifying code, etc) > from one kernel to the other, or we can attempt to make a given CPU > stop executing code from either kernel at all and then forcibly wrench > control of it in kernel B. Either approach seems okay, but the latter > approach depends on getting the CPU to reliably stop executing code. > We don't care about power consumption for resume, and I'm not even > convinced that we need to be able to survive an MCE that happens while > we're resuming, although surviving MCE would be nice. I'd thought you were proposing a global improvement: we get rid of mwait_play_dead() everywhere, i.e. all the time, not just for the resume path. Instead it sounds like you were proposing a local improvement to the resume path, to continue doing what hibernate_resume_nonboot_cpu_disable() is already doing, but use an INIT IPI instead of HLT to make sure the CPU is completely dead. That may be a theoretical improvement but we'd still need to do the whole "wake and play dead" dance which Jiri's patch is doing for offline CPUs. So Jiri's patch looks ok to me.
On Fri, 31 May 2019, Andy Lutomirski wrote: > The Intel SDM Vol 3 34.10 says: > > If the HLT instruction is restarted, the processor will generate a > memory access to fetch the HLT instruction (if it is > not in the internal cache), and execute a HLT bus transaction. This > behavior results in multiple HLT bus transactions > for the same HLT instruction. Which basically means that both hibernation and kexec have been broken in this respect for gazillions of years, and seems like noone noticed. Makes one wonder what the reason for that might be. Either SDM is not precise and the refetch actually never happens for real (or is always in these cases satisfied from I$ perhaps?), or ... ? So my patch basically puts things back where they have been for ages (while mwait is obviously much worse, as that gets woken up by the write to the monitored address, which inevitably does happen during resume), but seems like SDM is suggesting that we've been in a grey zone wrt RSM at least for all those ages. So perhaps we really should ditch resume_play_dead() altogether eventually, and replace it with sending INIT IPI around instead (and then waking the CPUs properly via INIT INIT START). I'd still like to do that for 5.3 though, as that'd be slightly bigger surgery, and conservatively put things basically back to state they have been up to now for 5.2. Thanks,
> On May 31, 2019, at 2:05 PM, Jiri Kosina <jikos@kernel.org> wrote: > >> On Fri, 31 May 2019, Andy Lutomirski wrote: >> >> The Intel SDM Vol 3 34.10 says: >> >> If the HLT instruction is restarted, the processor will generate a >> memory access to fetch the HLT instruction (if it is >> not in the internal cache), and execute a HLT bus transaction. This >> behavior results in multiple HLT bus transactions >> for the same HLT instruction. > > Which basically means that both hibernation and kexec have been broken in > this respect for gazillions of years, and seems like noone noticed. Makes > one wonder what the reason for that might be. > > Either SDM is not precise and the refetch actually never happens for real > (or is always in these cases satisfied from I$ perhaps?), or ... ? > > So my patch basically puts things back where they have been for ages > (while mwait is obviously much worse, as that gets woken up by the write > to the monitored address, which inevitably does happen during resume), but > seems like SDM is suggesting that we've been in a grey zone wrt RSM at > least for all those ages. > > So perhaps we really should ditch resume_play_dead() altogether > eventually, and replace it with sending INIT IPI around instead (and then > waking the CPUs properly via INIT INIT START). I'd still like to do that > for 5.3 though, as that'd be slightly bigger surgery, and conservatively > put things basically back to state they have been up to now for 5.2. > Seems reasonable to me. I would guess that it mostly works because SMI isn’t all that common and the window where it matters is short. Or maybe the SDM is misleading.
On Friday, May 31, 2019 6:19:52 PM CEST Josh Poimboeuf wrote: > On Fri, May 31, 2019 at 05:41:18PM +0200, Jiri Kosina wrote: > > On Fri, 31 May 2019, Josh Poimboeuf wrote: > > > > > The only question I'd have is if we have data on the power savings > > > difference between hlt and mwait. mwait seems to wake up on a lot of > > > different conditions which might negate its deeper sleep state. > > > > hlt wakes up on basically the same set of events, but has the > > auto-restarting semantics on some of them (especially SMM). So the wakeup > > frequency itself shouldn't really contribute to power consumption > > difference; it's the C-state that mwait allows CPU to enter. > > Ok. I reluctantly surrender :-) For your v4: > > Reviewed-by: Josh Poimboeuf <jpoimboe@redhat.com> > > It works as a short term fix, but it's fragile, and it does feel like > we're just adding more duct tape, as Andy said. OK, the v4 queued up then, thanks!
On Fri, May 31, 2019 at 02:22:27PM -0700, Andy Lutomirski wrote: > > > On May 31, 2019, at 2:05 PM, Jiri Kosina <jikos@kernel.org> wrote: > > > >> On Fri, 31 May 2019, Andy Lutomirski wrote: > >> > >> The Intel SDM Vol 3 34.10 says: > >> > >> If the HLT instruction is restarted, the processor will generate a > >> memory access to fetch the HLT instruction (if it is > >> not in the internal cache), and execute a HLT bus transaction. This > >> behavior results in multiple HLT bus transactions > >> for the same HLT instruction. > > > > Which basically means that both hibernation and kexec have been broken in > > this respect for gazillions of years, and seems like noone noticed. Makes > > one wonder what the reason for that might be. > > > > Either SDM is not precise and the refetch actually never happens for real > > (or is always in these cases satisfied from I$ perhaps?), or ... ? > > > > So my patch basically puts things back where they have been for ages > > (while mwait is obviously much worse, as that gets woken up by the write > > to the monitored address, which inevitably does happen during resume), but > > seems like SDM is suggesting that we've been in a grey zone wrt RSM at > > least for all those ages. > > > > So perhaps we really should ditch resume_play_dead() altogether > > eventually, and replace it with sending INIT IPI around instead (and then > > waking the CPUs properly via INIT INIT START). I'd still like to do that > > for 5.3 though, as that'd be slightly bigger surgery, and conservatively > > put things basically back to state they have been up to now for 5.2. > > > > > Seems reasonable to me. I would guess that it mostly works because SMI isn’t > all that common and the window where it matters is short. Or maybe the SDM > is misleading. For P6 and later, i.e. all modern CPUs, Intel processors go straight to halted state and don't fetch/decode the HLT instruction. P5 actually did a fetch, but from what I can tell that behavior wasn't carried forward to KNC, unlike other legacy interrupt crud from P5: [1] https://lkml.kernel.org/r/20190430004504.GH31379@linux.intel.com
On Mon, 3 Jun 2019, Sean Christopherson wrote: > For P6 and later, i.e. all modern CPUs, Intel processors go straight to > halted state and don't fetch/decode the HLT instruction. That'd be a rather relieving fact actually. Do you happen to know if this is stated in some Intel documentation and we've just overlooked it, or whether it's rather an information that's being carried over from generation to generation by whispering through grapevine? Thanks,
On Mon, Jun 03, 2019 at 05:24:26PM +0200, Jiri Kosina wrote: > On Mon, 3 Jun 2019, Sean Christopherson wrote: > > > For P6 and later, i.e. all modern CPUs, Intel processors go straight to > > halted state and don't fetch/decode the HLT instruction. > > That'd be a rather relieving fact actually. Do you happen to know if this > is stated in some Intel documentation and we've just overlooked it, or > whether it's rather an information that's being carried over from > generation to generation by whispering through grapevine? I highly doubt it's officially stated anywhere. Intel's approach to this type of micro-architecture specific behavior is (usually) to word the SDM in such a way that both approaches are legal. E.g. a 1993 version of the SDM says "Returns to interrupted HLT instruction", whereas in 1995, which just so happens to coincide with the introduction of the P6 architecture, the SDM started saying "Returns to HALT state" and added the blurb about "will generate a memory access to fetch the HLT instruction (if it is not in the internal cache)" so that the old behavior is still legal. All that being said, the "straight to HALT" behavior is now the de facto standard since lots of people will yell loudly if it changes.
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c index a7d966964c6f..513ce09e9950 100644 --- a/arch/x86/power/cpu.c +++ b/arch/x86/power/cpu.c @@ -299,7 +299,17 @@ int hibernate_resume_nonboot_cpu_disable(void) * address in its instruction pointer may not be possible to resolve * any more at that point (the page tables used by it previously may * have been overwritten by hibernate image data). + * + * First, make sure that we wake up all the potentially disabled SMT + * threads which have been initially brought up and then put into + * mwait/cpuidle sleep. + * Those will be put to proper (not interfering with hibernation + * resume) sleep afterwards, and the resumed kernel will decide itself + * what to do with them. */ + ret = cpuhp_smt_enable(); + if (ret) + return ret; smp_ops.play_dead = resume_play_dead; ret = disable_nonboot_cpus(); smp_ops.play_dead = play_dead; diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c index 4845b8c7be7f..fc413717a45f 100644 --- a/arch/x86/power/hibernate.c +++ b/arch/x86/power/hibernate.c @@ -11,6 +11,7 @@ #include <linux/suspend.h> #include <linux/scatterlist.h> #include <linux/kdebug.h> +#include <linux/cpu.h> #include <crypto/hash.h> @@ -245,3 +246,35 @@ int relocate_restore_code(void) __flush_tlb_all(); return 0; } + +int arch_resume_nosmt(void) +{ + int ret = 0; + /* + * We reached this while coming out of hibernation. This means + * that SMT siblings are sleeping in hlt, as mwait is not safe + * against control transition during resume (see comment in + * hibernate_resume_nonboot_cpu_disable()). + * + * If the resumed kernel has SMT disabled, we have to take all the + * SMT siblings out of hlt, and offline them again so that they + * end up in mwait proper. + * + * Called with hotplug disabled. + */ + cpu_hotplug_enable(); + if (cpu_smt_control == CPU_SMT_DISABLED || + cpu_smt_control == CPU_SMT_FORCE_DISABLED) { + enum cpuhp_smt_control old = cpu_smt_control; + + ret = cpuhp_smt_enable(); + if (ret) + goto out; + ret = cpuhp_smt_disable(old); + if (ret) + goto out; + } +out: + cpu_hotplug_disable(); + return ret; +} diff --git a/include/linux/cpu.h b/include/linux/cpu.h index 3813fe45effd..fcb1386bb0d4 100644 --- a/include/linux/cpu.h +++ b/include/linux/cpu.h @@ -201,10 +201,14 @@ enum cpuhp_smt_control { extern enum cpuhp_smt_control cpu_smt_control; extern void cpu_smt_disable(bool force); extern void cpu_smt_check_topology(void); +extern int cpuhp_smt_enable(void); +extern int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval); #else # define cpu_smt_control (CPU_SMT_NOT_IMPLEMENTED) static inline void cpu_smt_disable(bool force) { } static inline void cpu_smt_check_topology(void) { } +static inline int cpuhp_smt_enable(void) { return 0; } +static inline int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { return 0; } #endif /* diff --git a/kernel/cpu.c b/kernel/cpu.c index f2ef10460698..077fde6fb953 100644 --- a/kernel/cpu.c +++ b/kernel/cpu.c @@ -2061,7 +2061,7 @@ static void cpuhp_online_cpu_device(unsigned int cpu) kobject_uevent(&dev->kobj, KOBJ_ONLINE); } -static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) +int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) { int cpu, ret = 0; @@ -2093,7 +2093,7 @@ static int cpuhp_smt_disable(enum cpuhp_smt_control ctrlval) return ret; } -static int cpuhp_smt_enable(void) +int cpuhp_smt_enable(void) { int cpu, ret = 0; diff --git a/kernel/power/hibernate.c b/kernel/power/hibernate.c index c8c272df7154..b65635753e8e 100644 --- a/kernel/power/hibernate.c +++ b/kernel/power/hibernate.c @@ -257,6 +257,11 @@ void swsusp_show_speed(ktime_t start, ktime_t stop, (kps % 1000) / 10); } +__weak int arch_resume_nosmt(void) +{ + return 0; +} + /** * create_image - Create a hibernation image. * @platform_mode: Whether or not to use the platform driver. @@ -324,6 +329,10 @@ static int create_image(int platform_mode) Enable_cpus: suspend_enable_secondary_cpus(); + /* Allow architectures to do nosmt-specific post-resume dances */ + if (!in_suspend) + error = arch_resume_nosmt(); + Platform_finish: platform_finish(platform_mode);