Message ID | 1369062192-7066-1-git-send-email-mhocko@suse.cz (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
On 05/20/2013 08:03 AM, Michal Hocko wrote: > This reverts commit 78d77df71510a96e042de7ba6dbd7998103642cb because > it breaks resume from suspend to ram. Git bisect pointed to this patch > and the revert fixes the problem. > > There are no error messages neither during suspend not on resume when > the machine simply starts booting as if it wasn't suspended in RAM at > all. Machine details and configuration, please? This is one of a series of extremely bizarre suspend to RAM failures we are trying to make sense of... in this case, it means that somehow switching from setting the NX bit conditionally to unconditionally somehow fixes suspend to RAM! Needless to say, this is not just bizarre, this is extremely disturbing. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon 20-05-13 08:58:55, H. Peter Anvin wrote: > On 05/20/2013 08:03 AM, Michal Hocko wrote: > > This reverts commit 78d77df71510a96e042de7ba6dbd7998103642cb because > > it breaks resume from suspend to ram. Git bisect pointed to this patch > > and the revert fixes the problem. > > > > There are no error messages neither during suspend not on resume when > > the machine simply starts booting as if it wasn't suspended in RAM at > > all. > > Machine details and configuration, please? The configuration has been posted in the referenced thread: https://lkml.org/lkml/2013/5/14/398 dmicode and lspci -vvv are attached. What else you would be interested in? > This is one of a series of > extremely bizarre suspend to RAM failures we are trying to make sense > of... in this case, it means that somehow switching from setting the NX > bit conditionally to unconditionally somehow fixes suspend to RAM! Yes I was really surprised when I bisected to the patch. I was wondering how an init code could influence s2ram... > > Needless to say, this is not just bizarre, this is extremely disturbing. > > -hpa > >
On Mon, May 20, 2013 at 9:43 AM, Michal Hocko <mhocko@suse.cz> wrote: > > The configuration has been posted in the referenced thread: > https://lkml.org/lkml/2013/5/14/398 Hmm. That's a regular intel i5 Sandybridge CPU. Which certainly has NX. Hmm. secondary_startup_64 isn't __initcode, is it? I think it's used for resuming the other CPU's too and for CPU hotplug. Yes/no? If so early_pmd_flags shouldn't be __initdata, I think. Michal, does it work if you don't do the revert, but just remove the __initdata instead? Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/20/2013 10:21 AM, Linus Torvalds wrote: > On Mon, May 20, 2013 at 9:43 AM, Michal Hocko <mhocko@suse.cz> wrote: >> >> The configuration has been posted in the referenced thread: >> https://lkml.org/lkml/2013/5/14/398 > > Hmm. That's a regular intel i5 Sandybridge CPU. Which certainly has NX. > > Hmm. secondary_startup_64 isn't __initcode, is it? I think it's used > for resuming the other CPU's too and for CPU hotplug. Yes/no? > > If so early_pmd_flags shouldn't be __initdata, I think. > > Michal, does it work if you don't do the revert, but just remove the > __initdata instead? > Indeed. Looks like we corrupt a random bit in memory on resume. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Mon, May 20, 2013 at 11:13 AM, H. Peter Anvin <hpa@linux.intel.com> wrote: > > Indeed. Looks like we corrupt a random bit in memory on resume. Well, in this case, it's probably the DEBUG_PAGEALLOC that Michal has enabled, which causes a page fault - and then a triple fault - at secondary CPU startup. And the triple fault then results in a reboot. Which explains the symptoms he gets. Without DEBUG_PAGEALLOC it ends up being "just" a single-bit corruption, yes. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/20/2013 11:18 AM, Linus Torvalds wrote: > On Mon, May 20, 2013 at 11:13 AM, H. Peter Anvin <hpa@linux.intel.com> wrote: >> >> Indeed. Looks like we corrupt a random bit in memory on resume. > > Well, in this case, it's probably the DEBUG_PAGEALLOC that Michal has > enabled, which causes a page fault - and then a triple fault - at > secondary CPU startup. And the triple fault then results in a reboot. > Which explains the symptoms he gets. > > Without DEBUG_PAGEALLOC it ends up being "just" a single-bit corruption, yes. > Good, that explains what made it predictable. Score one for DEBUG_PAGEALLOC. -hpa -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kernel/head64.c b/arch/x86/kernel/head64.c index dab95a8..101ac1a9 100644 --- a/arch/x86/kernel/head64.c +++ b/arch/x86/kernel/head64.c @@ -34,7 +34,6 @@ extern pgd_t early_level4_pgt[PTRS_PER_PGD]; extern pmd_t early_dynamic_pgts[EARLY_DYNAMIC_PAGE_TABLES][PTRS_PER_PMD]; static unsigned int __initdata next_early_pgt = 2; -pmdval_t __initdata early_pmd_flags = __PAGE_KERNEL_LARGE & ~(_PAGE_GLOBAL | _PAGE_NX); /* Wipe all early page tables except for the kernel symbol map */ static void __init reset_early_page_tables(void) @@ -100,7 +99,7 @@ again: pmd_p[i] = 0; *pud_p = (pudval_t)pmd_p - __START_KERNEL_map + phys_base + _KERNPG_TABLE; } - pmd = (physaddr & PMD_MASK) + early_pmd_flags; + pmd = (physaddr & PMD_MASK) + (__PAGE_KERNEL_LARGE & ~_PAGE_GLOBAL); pmd_p[pmd_index(address)] = pmd; return 0; diff --git a/arch/x86/kernel/head_64.S b/arch/x86/kernel/head_64.S index 08f7e80..6859e96 100644 --- a/arch/x86/kernel/head_64.S +++ b/arch/x86/kernel/head_64.S @@ -200,7 +200,6 @@ ENTRY(secondary_startup_64) btl $20,%edi /* No Execute supported? */ jnc 1f btsl $_EFER_NX, %eax - btsq $_PAGE_BIT_NX,early_pmd_flags(%rip) 1: wrmsr /* Make changes effective */ /* Setup cr0 */
This reverts commit 78d77df71510a96e042de7ba6dbd7998103642cb because it breaks resume from suspend to ram. Git bisect pointed to this patch and the revert fixes the problem. There are no error messages neither during suspend not on resume when the machine simply starts booting as if it wasn't suspended in RAM at all. References: https://lkml.org/lkml/2013/5/14/398 Cc: stable@vger.kernel.org # 3.9 Signed-off-by: Michal Hocko <mhocko@suse.cz> --- arch/x86/kernel/head64.c | 3 +-- arch/x86/kernel/head_64.S | 1 - 2 files changed, 1 insertion(+), 3 deletions(-)