Message ID | CAGXu5jJ+hRAVNfa=xWd9c9On5vC6k+kZKkaXGSXS5WfjTK2QcA@mail.gmail.com (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: > On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: >> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: >>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>>>> >>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I have been working on a bug that causes my laptop to freeze during >>>>>> resume from hibernation. I did a bisect to find the offending commit: >>>>>> >>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>> >>>>>> There is more information in the bugzilla report [1] that >>>>>> I've been working on but I will summarize things below. >>>>>> >>>>>> I've experienced intermittent but reproducible freezes when resuming >>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>> significantly more reproducible when a few applications were loaded >>>>>> before hibernation and would largely not happen if hibernated >>>>>> immediately after booting to a desktop. I did some tracing work to find >>>>>> that the kernel gets as far as the resume_image call in >>>>>> swsusp_arch_resume and I could not find any response from the image >>>>>> kernel when I hit the bug. I also did testing that seemed to rule out >>>>>> this being caused by a problematic driver. >>>>>> >>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. >>>>>> Then, I did a second bisect with a ported version of the fix to the >>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation >>>>>> with what appears to be the exact same symptoms. Reverting that commit >>>>>> in recent kernels up to and including 4.6 fixes the issue and restores >>>>>> reliable hibernation. However, it's not at all clear to me why that >>>>>> commit would cause this issue or how to fix the issue without reverting. >>>>> >>>>> I've attached that commit below and also Cc:-ed a few more people who might have >>>>> an idea about why this regressed. Worst-case we'll have to revert it. >>>> >>>> Without looking deep into mm, my theory would be that after this patch >>>> the final jump from the boot kernel to the image kernel's trampoline >>>> code during resume may crash the kernel if the trampoline page turns >>>> out to be NX in the boot kernel (it has to be executable in both the >>>> boot and the image kernels). >>> >>> So, pardon my ignorance, but where is this trampoline page placed in >>> kernel memory? >> >> On 32-bit its location has to be the same in both the boot and the >> image kernels and that's within kernel text in both cases, so that >> shouldn't be a problem. >> >> On 64-bit its location depends on the image kernel and specifically on >> the location of the restore_registers routine in it. The (virtual) >> address of that routine is stored in the restore_jump_address >> variable, so the page containing it (the trampoline page) can be found >> with the help of that. >> >> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >> the image restoration and that page must not be NX in that mapping for >> things to work. > > It looks like nothing in the swsusp_arch_resume() -> get_safe_page() > -> get_image_page() path sets the page executable... > > Untested, but I wonder if this work work in swsusp_arch_resume() > before the memcpy? I can't type today, it seems. It should read "... if this would work ..." If you can test this and it works for you, I'll send a proper patch... :P -Kees > > (apologies for any gmail-based whitespace mangling...) > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d419a6..c2f3ecc45bd4 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -12,6 +12,7 @@ > #include <linux/smp.h> > #include <linux/suspend.h> > > +#include <asm/cacheflush.h> > #include <asm/init.h> > #include <asm/proto.h> > #include <asm/page.h> > @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) > relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); > if (!relocated_restore_code) > return -ENOMEM; > + set_memory_x((unsigned long)relocated_restore_code, 1); > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > > -Kees > > -- > Kees Cook > Chrome OS & Brillo Security
On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > On 20/05/16 04:16 PM, Kees Cook wrote: >> >> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> wrote: >>> >>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> >>> wrote: >>>> >>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> >>>> wrote: >>>>> >>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>>>> >>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: >>>>>>> >>>>>>> >>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>>>> >>>>>>>> Hi, >>>>>>>> >>>>>>>> I have been working on a bug that causes my laptop to freeze during >>>>>>>> resume from hibernation. I did a bisect to find the offending >>>>>>>> commit: >>>>>>>> >>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>>>> >>>>>>>> There is more information in the bugzilla report [1] that >>>>>>>> I've been working on but I will summarize things below. >>>>>>>> >>>>>>>> I've experienced intermittent but reproducible freezes when resuming >>>>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>>>> significantly more reproducible when a few applications were loaded >>>>>>>> before hibernation and would largely not happen if hibernated >>>>>>>> immediately after booting to a desktop. I did some tracing work to >>>>>>>> find >>>>>>>> that the kernel gets as far as the resume_image call in >>>>>>>> swsusp_arch_resume and I could not find any response from the image >>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule >>>>>>>> out >>>>>>>> this being caused by a problematic driver. >>>>>>>> >>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in >>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in >>>>>>>> 4.4. >>>>>>>> Then, I did a second bisect with a ported version of the fix to the >>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break >>>>>>>> hibernation >>>>>>>> with what appears to be the exact same symptoms. Reverting that >>>>>>>> commit >>>>>>>> in recent kernels up to and including 4.6 fixes the issue and >>>>>>>> restores >>>>>>>> reliable hibernation. However, it's not at all clear to me why that >>>>>>>> commit would cause this issue or how to fix the issue without >>>>>>>> reverting. >>>>>>> >>>>>>> >>>>>>> I've attached that commit below and also Cc:-ed a few more people who >>>>>>> might have >>>>>>> an idea about why this regressed. Worst-case we'll have to revert it. >>>>>> >>>>>> >>>>>> Without looking deep into mm, my theory would be that after this patch >>>>>> the final jump from the boot kernel to the image kernel's trampoline >>>>>> code during resume may crash the kernel if the trampoline page turns >>>>>> out to be NX in the boot kernel (it has to be executable in both the >>>>>> boot and the image kernels). >>>>> >>>>> >>>>> So, pardon my ignorance, but where is this trampoline page placed in >>>>> kernel memory? >>>> >>>> >>>> On 32-bit its location has to be the same in both the boot and the >>>> image kernels and that's within kernel text in both cases, so that >>>> shouldn't be a problem. >>>> >>>> On 64-bit its location depends on the image kernel and specifically on >>>> the location of the restore_registers routine in it. The (virtual) >>>> address of that routine is stored in the restore_jump_address >>>> variable, so the page containing it (the trampoline page) can be found >>>> with the help of that. >>>> >>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >>>> the image restoration and that page must not be NX in that mapping for >>>> things to work. >>> >>> >>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page() >>> -> get_image_page() path sets the page executable... >>> >>> Untested, but I wonder if this work work in swsusp_arch_resume() >>> before the memcpy? >> >> >> I can't type today, it seems. It should read "... if this would work ..." >> >> If you can test this and it works for you, I'll send a proper patch... :P >> >> -Kees >> > > Hi Kees, > > Thanks. I tried the patch but it only resulted in a kernel warning and > freeze. I've attached a photo showing as much of the messages as I could > get. > > Logan Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one out. -Kees
On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hey, > > I've still be trying to figure this out as I have time. > > I tried printing a couple restore addresses and nothing I can find seems > anywhere near the rodata/ex_table boundary. > > I tried with the (badly formatted) below and got the following. Nothing too > surprising. I've attached a kallsyms that matches the kernel for reference. > > restore_code: ffff880157c3b000 > jump_addr: ffffffff81446be0 > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d..6efedb7 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > + pr_info("restore_code: %p\n", relocated_restore_code); > + pr_info("jump_addr: %lx\n", restore_jump_address); > + Also interesting would be the "relocated_restore_code" address, as well as a dump of /sys/kernel/debug/kernel_page_tables (from CONFIG_X86_PTDUMP). I'm baffled by the problem, but the best I can understand is the the relocated_restore_code range isn't executable (which should be visible from finding it in /sys/kernel/debug/kernel_page_tables), but I don't see how to solve that since my original patch didn't work. Rafael, is this something you have time to look at quickly? -Kees > restore_image(); > return 0; > } > > > Thanks, > > Logan > > > > On 21/05/16 10:39 AM, Kees Cook wrote: >> >> On Fri, May 20, 2016 at 6:57 PM, Logan Gunthorpe <logang@deltatee.com> >> wrote: >>> >>> On 20/05/16 04:16 PM, Kees Cook wrote: >>>> >>>> >>>> On Fri, May 20, 2016 at 2:59 PM, Kees Cook <keescook@chromium.org> >>>> wrote: >>>>> >>>>> >>>>> On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> >>>>> wrote: >>>>>> >>>>>> >>>>>> On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> >>>>>> wrote: >>>>>>> >>>>>>> >>>>>>> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: >>>>>>>> >>>>>>>> >>>>>>>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> >>>>>>>> wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> * Logan Gunthorpe <logang@deltatee.com> wrote: >>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> >>>>>>>>>> I have been working on a bug that causes my laptop to freeze >>>>>>>>>> during >>>>>>>>>> resume from hibernation. I did a bisect to find the offending >>>>>>>>>> commit: >>>>>>>>>> >>>>>>>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata >>>>>>>>>> >>>>>>>>>> There is more information in the bugzilla report [1] that >>>>>>>>>> I've been working on but I will summarize things below. >>>>>>>>>> >>>>>>>>>> I've experienced intermittent but reproducible freezes when >>>>>>>>>> resuming >>>>>>>>>> from hibernation since about kernel version 3.19. The freeze was >>>>>>>>>> significantly more reproducible when a few applications were >>>>>>>>>> loaded >>>>>>>>>> before hibernation and would largely not happen if hibernated >>>>>>>>>> immediately after booting to a desktop. I did some tracing work to >>>>>>>>>> find >>>>>>>>>> that the kernel gets as far as the resume_image call in >>>>>>>>>> swsusp_arch_resume and I could not find any response from the >>>>>>>>>> image >>>>>>>>>> kernel when I hit the bug. I also did testing that seemed to rule >>>>>>>>>> out >>>>>>>>>> this being caused by a problematic driver. >>>>>>>>>> >>>>>>>>>> I did a successful bisect between 3.18 and 3.19 which found a bug >>>>>>>>>> in >>>>>>>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in >>>>>>>>>> 4.4. >>>>>>>>>> Then, I did a second bisect with a ported version of the fix to >>>>>>>>>> the >>>>>>>>>> first bug and found commit ab76f7b4ab in 4.3 to also break >>>>>>>>>> hibernation >>>>>>>>>> with what appears to be the exact same symptoms. Reverting that >>>>>>>>>> commit >>>>>>>>>> in recent kernels up to and including 4.6 fixes the issue and >>>>>>>>>> restores >>>>>>>>>> reliable hibernation. However, it's not at all clear to me why >>>>>>>>>> that >>>>>>>>>> commit would cause this issue or how to fix the issue without >>>>>>>>>> reverting. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> I've attached that commit below and also Cc:-ed a few more people >>>>>>>>> who >>>>>>>>> might have >>>>>>>>> an idea about why this regressed. Worst-case we'll have to revert >>>>>>>>> it. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Without looking deep into mm, my theory would be that after this >>>>>>>> patch >>>>>>>> the final jump from the boot kernel to the image kernel's trampoline >>>>>>>> code during resume may crash the kernel if the trampoline page turns >>>>>>>> out to be NX in the boot kernel (it has to be executable in both the >>>>>>>> boot and the image kernels). >>>>>>> >>>>>>> >>>>>>> >>>>>>> So, pardon my ignorance, but where is this trampoline page placed in >>>>>>> kernel memory? >>>>>> >>>>>> >>>>>> >>>>>> On 32-bit its location has to be the same in both the boot and the >>>>>> image kernels and that's within kernel text in both cases, so that >>>>>> shouldn't be a problem. >>>>>> >>>>>> On 64-bit its location depends on the image kernel and specifically on >>>>>> the location of the restore_registers routine in it. The (virtual) >>>>>> address of that routine is stored in the restore_jump_address >>>>>> variable, so the page containing it (the trampoline page) can be found >>>>>> with the help of that. >>>>>> >>>>>> swsusp_arch_resume() sets up a temporary kernel mapping to finalize >>>>>> the image restoration and that page must not be NX in that mapping for >>>>>> things to work. >>>>> >>>>> >>>>> >>>>> It looks like nothing in the swsusp_arch_resume() -> get_safe_page() >>>>> -> get_image_page() path sets the page executable... >>>>> >>>>> Untested, but I wonder if this work work in swsusp_arch_resume() >>>>> before the memcpy? >>>> >>>> >>>> >>>> I can't type today, it seems. It should read "... if this would work >>>> ..." >>>> >>>> If you can test this and it works for you, I'll send a proper patch... >>>> :P >>>> >>>> -Kees >>>> >>> >>> Hi Kees, >>> >>> Thanks. I tried the patch but it only resulted in a kernel warning and >>> freeze. I've attached a photo showing as much of the messages as I could >>> get. >>> >>> Logan >> >> >> Ah, dang, ok, thanks for trying it. I'll let Rafael try to figure this one >> out. >> >> -Kees >> >
Hey, On 10/06/16 12:09 PM, Kees Cook wrote: >> restore_code: ffff880157c3b000 >> jump_addr: ffffffff81446be0 >> >> >> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c >> index 009947d..6efedb7 100644 >> --- a/arch/x86/power/hibernate_64.c >> +++ b/arch/x86/power/hibernate_64.c >> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) >> memcpy(relocated_restore_code, &core_restore_code, >> &restore_registers - &core_restore_code); >> >> + pr_info("restore_code: %p\n", relocated_restore_code); >> + pr_info("jump_addr: %lx\n", restore_jump_address); >> + > > Also interesting would be the "relocated_restore_code" address, as > well as a dump of /sys/kernel/debug/kernel_page_tables (from > CONFIG_X86_PTDUMP). Is that not what I printed? If not, can you give me a better hint as to what you're looking for so I can spin another kernel? I'll also provide the kernel_page_tables once I do that. > I'm baffled by the problem, but the best I can understand is the the > relocated_restore_code range isn't executable (which should be visible > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > see how to solve that since my original patch didn't work. Yeah this is definitely a baffling problem. Thanks, Logan -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Fri, Jun 10, 2016 at 11:16 AM, Logan Gunthorpe <logang@deltatee.com> wrote: > Hey, > > On 10/06/16 12:09 PM, Kees Cook wrote: >>> restore_code: ffff880157c3b000 >>> jump_addr: ffffffff81446be0 >>> >>> >>> diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c >>> index 009947d..6efedb7 100644 >>> --- a/arch/x86/power/hibernate_64.c >>> +++ b/arch/x86/power/hibernate_64.c >>> @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) >>> memcpy(relocated_restore_code, &core_restore_code, >>> &restore_registers - &core_restore_code); >>> >>> + pr_info("restore_code: %p\n", relocated_restore_code); >>> + pr_info("jump_addr: %lx\n", restore_jump_address); >>> + >> >> Also interesting would be the "relocated_restore_code" address, as >> well as a dump of /sys/kernel/debug/kernel_page_tables (from >> CONFIG_X86_PTDUMP). > > Is that not what I printed? If not, can you give me a better hint as to Oh, whoops, sorry, I saw "restore_code" in the pr_info and "relocate_restore_code" in the memcpy and didn't scan the right thing in the pr_info line. :) > what you're looking for so I can spin another kernel? I'll also provide > the kernel_page_tables once I do that. Cool, thanks. > >> I'm baffled by the problem, but the best I can understand is the the >> relocated_restore_code range isn't executable (which should be visible >> from finding it in /sys/kernel/debug/kernel_page_tables), but I don't >> see how to solve that since my original patch didn't work. > > Yeah this is definitely a baffling problem. -Kees
On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote: > On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > Hey, > > > > I've still be trying to figure this out as I have time. > > > > I tried printing a couple restore addresses and nothing I can find seems > > anywhere near the rodata/ex_table boundary. > > > > I tried with the (badly formatted) below and got the following. Nothing too > > surprising. I've attached a kallsyms that matches the kernel for reference. > > > > restore_code: ffff880157c3b000 > > jump_addr: ffffffff81446be0 > > > > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > > index 009947d..6efedb7 100644 > > --- a/arch/x86/power/hibernate_64.c > > +++ b/arch/x86/power/hibernate_64.c > > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > > memcpy(relocated_restore_code, &core_restore_code, > > &restore_registers - &core_restore_code); > > > > + pr_info("restore_code: %p\n", relocated_restore_code); > > + pr_info("jump_addr: %lx\n", restore_jump_address); > > + > > Also interesting would be the "relocated_restore_code" address, as > well as a dump of /sys/kernel/debug/kernel_page_tables (from > CONFIG_X86_PTDUMP). > > I'm baffled by the problem, but the best I can understand is the the > relocated_restore_code range isn't executable (which should be visible > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > see how to solve that since my original patch didn't work. > > Rafael, is this something you have time to look at quickly? Well, not really, but I'll do my best to look at it in the next few days. Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, May 20, 2016 02:59:30 PM Kees Cook wrote: > On Fri, May 20, 2016 at 2:46 PM, Rafael J. Wysocki <rafael@kernel.org> wrote: > > On Fri, May 20, 2016 at 3:56 PM, Stephen Smalley <sds@tycho.nsa.gov> wrote: > >> On 05/20/2016 07:34 AM, Rafael J. Wysocki wrote: > >>> On Fri, May 20, 2016 at 9:15 AM, Ingo Molnar <mingo@kernel.org> wrote: > >>>> > >>>> * Logan Gunthorpe <logang@deltatee.com> wrote: > >>>> > >>>>> Hi, > >>>>> > >>>>> I have been working on a bug that causes my laptop to freeze during > >>>>> resume from hibernation. I did a bisect to find the offending commit: > >>>>> > >>>>> [ab76f7b4ab] x86/mm: Set NX on gap between __ex_table and rodata > >>>>> > >>>>> There is more information in the bugzilla report [1] that > >>>>> I've been working on but I will summarize things below. > >>>>> > >>>>> I've experienced intermittent but reproducible freezes when resuming > >>>>> from hibernation since about kernel version 3.19. The freeze was > >>>>> significantly more reproducible when a few applications were loaded > >>>>> before hibernation and would largely not happen if hibernated > >>>>> immediately after booting to a desktop. I did some tracing work to find > >>>>> that the kernel gets as far as the resume_image call in > >>>>> swsusp_arch_resume and I could not find any response from the image > >>>>> kernel when I hit the bug. I also did testing that seemed to rule out > >>>>> this being caused by a problematic driver. > >>>>> > >>>>> I did a successful bisect between 3.18 and 3.19 which found a bug in > >>>>> commit f5b2831d6 that was then later fixed by commit 55696b1f66 in 4.4. > >>>>> Then, I did a second bisect with a ported version of the fix to the > >>>>> first bug and found commit ab76f7b4ab in 4.3 to also break hibernation > >>>>> with what appears to be the exact same symptoms. Reverting that commit > >>>>> in recent kernels up to and including 4.6 fixes the issue and restores > >>>>> reliable hibernation. However, it's not at all clear to me why that > >>>>> commit would cause this issue or how to fix the issue without reverting. > >>>> > >>>> I've attached that commit below and also Cc:-ed a few more people who might have > >>>> an idea about why this regressed. Worst-case we'll have to revert it. > >>> > >>> Without looking deep into mm, my theory would be that after this patch > >>> the final jump from the boot kernel to the image kernel's trampoline > >>> code during resume may crash the kernel if the trampoline page turns > >>> out to be NX in the boot kernel (it has to be executable in both the > >>> boot and the image kernels). > >> > >> So, pardon my ignorance, but where is this trampoline page placed in > >> kernel memory? > > > > On 32-bit its location has to be the same in both the boot and the > > image kernels and that's within kernel text in both cases, so that > > shouldn't be a problem. > > > > On 64-bit its location depends on the image kernel and specifically on > > the location of the restore_registers routine in it. The (virtual) > > address of that routine is stored in the restore_jump_address > > variable, so the page containing it (the trampoline page) can be found > > with the help of that. > > > > swsusp_arch_resume() sets up a temporary kernel mapping to finalize > > the image restoration and that page must not be NX in that mapping for > > things to work. > > It looks like nothing in the swsusp_arch_resume() -> get_safe_page() > -> get_image_page() path sets the page executable... > > Untested, but I wonder if this work work in swsusp_arch_resume() > before the memcpy? > > (apologies for any gmail-based whitespace mangling...) > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > index 009947d419a6..c2f3ecc45bd4 100644 > --- a/arch/x86/power/hibernate_64.c > +++ b/arch/x86/power/hibernate_64.c > @@ -12,6 +12,7 @@ > #include <linux/smp.h> > #include <linux/suspend.h> > > +#include <asm/cacheflush.h> > #include <asm/init.h> > #include <asm/proto.h> > #include <asm/page.h> > @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) > relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); > if (!relocated_restore_code) > return -ENOMEM; > + set_memory_x((unsigned long)relocated_restore_code, 1); > memcpy(relocated_restore_code, &core_restore_code, > &restore_registers - &core_restore_code); > > We may not be on the right track with this I'm afraid. When the temporary kernel mapping is set up in swsusp_arch_resume(), it passes __PAGE_KERNEL_LARGE_EXEC as pmd_flag in the x86_mapping_info, so all memory should be executable after we switch to that which happens right at the beginning of restore_image(). So restore_image() and the code it jumps to should be fine from the executability perspective (including the final jump to restore_jump_address). Or am I missingy anything? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > OK, I have a theory, but I need a bit of help. > > This may be a dumb question, but I don't quite remember the answer readily. > > Given a physical address, how do I get the corresponding virtual one under > the kernel identity mapping? Is that not just 'phys_to_virt(x)'?? Logan -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 10, 2016 11:27:29 PM Rafael J. Wysocki wrote: > On Friday, June 10, 2016 11:09:22 AM Kees Cook wrote: > > On Thu, Jun 9, 2016 at 9:14 PM, Logan Gunthorpe <logang@deltatee.com> wrote: > > > Hey, > > > > > > I've still be trying to figure this out as I have time. > > > > > > I tried printing a couple restore addresses and nothing I can find seems > > > anywhere near the rodata/ex_table boundary. > > > > > > I tried with the (badly formatted) below and got the following. Nothing too > > > surprising. I've attached a kallsyms that matches the kernel for reference. > > > > > > restore_code: ffff880157c3b000 > > > jump_addr: ffffffff81446be0 > > > > > > > > > diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c > > > index 009947d..6efedb7 100644 > > > --- a/arch/x86/power/hibernate_64.c > > > +++ b/arch/x86/power/hibernate_64.c > > > @@ -92,6 +92,9 @@ int swsusp_arch_resume(void) > > > memcpy(relocated_restore_code, &core_restore_code, > > > &restore_registers - &core_restore_code); > > > > > > + pr_info("restore_code: %p\n", relocated_restore_code); > > > + pr_info("jump_addr: %lx\n", restore_jump_address); > > > + > > > > Also interesting would be the "relocated_restore_code" address, as > > well as a dump of /sys/kernel/debug/kernel_page_tables (from > > CONFIG_X86_PTDUMP). > > > > I'm baffled by the problem, but the best I can understand is the the > > relocated_restore_code range isn't executable (which should be visible > > from finding it in /sys/kernel/debug/kernel_page_tables), but I don't > > see how to solve that since my original patch didn't work. > > > > Rafael, is this something you have time to look at quickly? > > Well, not really, but I'll do my best to look at it in the next few days. OK, I have a theory, but I need a bit of help. This may be a dumb question, but I don't quite remember the answer readily. Given a physical address, how do I get the corresponding virtual one under the kernel identity mapping? Thanks, Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Friday, June 10, 2016 04:28:01 PM Logan Gunthorpe wrote: > > On 10/06/16 04:29 PM, Rafael J. Wysocki wrote: > > OK, I have a theory, but I need a bit of help. > > > > This may be a dumb question, but I don't quite remember the answer readily. > > > > Given a physical address, how do I get the corresponding virtual one under > > the kernel identity mapping? > > Is that not just 'phys_to_virt(x)'?? Ah that. Thanks! Rafael -- To unsubscribe from this list: send the line "unsubscribe linux-pm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/power/hibernate_64.c b/arch/x86/power/hibernate_64.c index 009947d419a6..c2f3ecc45bd4 100644 --- a/arch/x86/power/hibernate_64.c +++ b/arch/x86/power/hibernate_64.c @@ -12,6 +12,7 @@ #include <linux/smp.h> #include <linux/suspend.h> +#include <asm/cacheflush.h> #include <asm/init.h> #include <asm/proto.h> #include <asm/page.h> @@ -89,6 +90,7 @@ int swsusp_arch_resume(void) relocated_restore_code = (void *)get_safe_page(GFP_ATOMIC); if (!relocated_restore_code) return -ENOMEM; + set_memory_x((unsigned long)relocated_restore_code, 1); memcpy(relocated_restore_code, &core_restore_code, &restore_registers - &core_restore_code);