Message ID | 20230110211839.19572-1-rick.p.edgecombe@intel.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | [v2] x86/hibernate: Use fixmap for saving unmapped pages | expand |
On Tue, Jan 10, 2023 at 10:19 PM Rick Edgecombe <rick.p.edgecombe@intel.com> wrote: > > Hibernate uses the direct map to read memory it saves to disk. Since > sometimes pages are not accessible on the direct map ("not present" on > x86), it has special case logic to temporarily make a page present. On x86 > these direct map addresses can be mapped at various page sizes, but the > logic works ok as long as the not present pages are always mapped as > PAGE_SIZE such that they don't require a split to map the region as > present. If the address was mapped not present by a larger page size, the > split may fail and hibernate would then try to read an address mapped not > present. > > Today on x86 there are no known cases of this (huge not present pages on > the direct map), but it has come up from time to time when developing > things that operate on the direct map. It blocked making > VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also > has been a complication for various direct map protection efforts. > > This dependency is also pretty hidden and easily missed by people poking at > the direct map. For this reason, there are warnings in place to complain > but not handle this scenario. > > One way to make this more robust would be to create some new CPA > functionality that can know to map and reset the whole huge page in the > case of trying to map a subpage. But for simplicity and smaller code, just > make x86 hibernate have its own fixmap PTE that it can use to point > to 4k pages when it encounters an unmapped direct map page. > > Move do_copy_page() to a header such that it can be used in an arch > breakout. Rename it hib_copy_page() to be more hibernate specific since > it could appear in other files. > > Use __weak for the arch breakout because there is not a suitable arch > specific header to use the #define method. > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > --- > > Rebase to v6.2-rc3 (original still applied) and resending per: > https://lore.kernel.org/lkml/CAJZ5v0i6cxGD+V6G+q-Y_Lp-ov51_zmkZr8ZGpCtqWV-e=BsLg@mail.gmail.com/ > > arch/x86/include/asm/fixmap.h | 3 +++ > arch/x86/power/hibernate.c | 10 ++++++++++ > include/linux/suspend.h | 13 +++++++++++++ > kernel/power/snapshot.c | 21 +++++++-------------- > 4 files changed, 33 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index d0dcefb5cc59..0fceed9a4152 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -108,6 +108,9 @@ enum fixed_addresses { > #ifdef CONFIG_PARAVIRT_XXL > FIX_PARAVIRT_BOOTMAP, > #endif > +#ifdef CONFIG_HIBERNATION > + FIX_HIBERNATE, > +#endif > > #ifdef CONFIG_ACPI_APEI_GHES > /* Used for GHES mapping from assorted contexts */ > diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c > index 6f955eb1e163..473b6b5f6b7e 100644 > --- a/arch/x86/power/hibernate.c > +++ b/arch/x86/power/hibernate.c > @@ -147,6 +147,16 @@ int arch_hibernation_header_restore(void *addr) > return 0; > } > > +void copy_unmapped_page(void *dst, struct page *page) > +{ > + WARN_ON(!preempt_count()); I don't think the above is needed. The code using this function cannot be preempted anyway AFAICS. > + > + set_fixmap(FIX_HIBERNATE, page_to_phys(page)); > + __flush_tlb_all(); So do TLBs need to be flushed before copying every single page? Basically, they are all copied in one loop. > + hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE)); > + clear_fixmap(FIX_HIBERNATE); > +} > + > int relocate_restore_code(void) > { > pgd_t *pgd; > diff --git a/include/linux/suspend.h b/include/linux/suspend.h > index cfe19a028918..0b19b910526e 100644 > --- a/include/linux/suspend.h > +++ b/include/linux/suspend.h > @@ -447,6 +447,19 @@ extern bool hibernation_available(void); > asmlinkage int swsusp_save(void); > extern struct pbe *restore_pblist; > int pfn_is_nosave(unsigned long pfn); > +void copy_unmapped_page(void *dst, struct page *page); > + > +/* > + * This is needed, because copy_page and memcpy are not usable for copying > + * task structs. > + */ > +static inline void hib_copy_page(long *dst, long *src) > +{ > + int n; > + > + for (n = PAGE_SIZE / sizeof(long); n; n--) > + *dst++ = *src++; > +} > > int hibernate_quiet_exec(int (*func)(void *data), void *data); > #else /* CONFIG_HIBERNATION */ > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index cd8b7b35f1e8..344c071f29d3 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -1369,16 +1369,11 @@ static unsigned int count_data_pages(void) > return n; > } > > -/* > - * This is needed, because copy_page and memcpy are not usable for copying > - * task structs. > - */ > -static inline void do_copy_page(long *dst, long *src) > +void __weak copy_unmapped_page(void *dst, struct page *page) > { > - int n; > - > - for (n = PAGE_SIZE / sizeof(long); n; n--) > - *dst++ = *src++; > + hibernate_map_page(page); > + hib_copy_page(dst, page_address(page)); > + hibernate_unmap_page(page); > } > > /** > @@ -1392,11 +1387,9 @@ static inline void do_copy_page(long *dst, long *src) > static void safe_copy_page(void *dst, struct page *s_page) > { > if (kernel_page_present(s_page)) { > - do_copy_page(dst, page_address(s_page)); > + hib_copy_page(dst, page_address(s_page)); > } else { > - hibernate_map_page(s_page); > - do_copy_page(dst, page_address(s_page)); > - hibernate_unmap_page(s_page); > + copy_unmapped_page(dst, s_page); > } > } > > @@ -1417,7 +1410,7 @@ static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn) > if (PageHighMem(s_page)) { > src = kmap_atomic(s_page); > dst = kmap_atomic(d_page); > - do_copy_page(dst, src); > + hib_copy_page(dst, src); > kunmap_atomic(dst); > kunmap_atomic(src); > } else { > -- > 2.17.1 >
On Wed, 2023-01-11 at 21:35 +0100, Rafael J. Wysocki wrote: > On Tue, Jan 10, 2023 at 10:19 PM Rick Edgecombe > <rick.p.edgecombe@intel.com> wrote: > > > > Hibernate uses the direct map to read memory it saves to disk. > > Since > > sometimes pages are not accessible on the direct map ("not present" > > on > > x86), it has special case logic to temporarily make a page present. > > On x86 > > these direct map addresses can be mapped at various page sizes, but > > the > > logic works ok as long as the not present pages are always mapped > > as > > PAGE_SIZE such that they don't require a split to map the region as > > present. If the address was mapped not present by a larger page > > size, the > > split may fail and hibernate would then try to read an address > > mapped not > > present. > > > > Today on x86 there are no known cases of this (huge not present > > pages on > > the direct map), but it has come up from time to time when > > developing > > things that operate on the direct map. It blocked making > > VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and > > also > > has been a complication for various direct map protection efforts. > > > > This dependency is also pretty hidden and easily missed by people > > poking at > > the direct map. For this reason, there are warnings in place to > > complain > > but not handle this scenario. > > > > One way to make this more robust would be to create some new CPA > > functionality that can know to map and reset the whole huge page in > > the > > case of trying to map a subpage. But for simplicity and smaller > > code, just > > make x86 hibernate have its own fixmap PTE that it can use to point > > to 4k pages when it encounters an unmapped direct map page. > > > > Move do_copy_page() to a header such that it can be used in an arch > > breakout. Rename it hib_copy_page() to be more hibernate specific > > since > > it could appear in other files. > > > > Use __weak for the arch breakout because there is not a suitable > > arch > > specific header to use the #define method. > > > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > --- > > > > Rebase to v6.2-rc3 (original still applied) and resending per: > > https://lore.kernel.org/lkml/CAJZ5v0i6cxGD+V6G+q-Y_Lp-ov51_zmkZr8ZGpCtqWV-e=BsLg@mail.gmail.com/ > > > > arch/x86/include/asm/fixmap.h | 3 +++ > > arch/x86/power/hibernate.c | 10 ++++++++++ > > include/linux/suspend.h | 13 +++++++++++++ > > kernel/power/snapshot.c | 21 +++++++-------------- > > 4 files changed, 33 insertions(+), 14 deletions(-) > > > > diff --git a/arch/x86/include/asm/fixmap.h > > b/arch/x86/include/asm/fixmap.h > > index d0dcefb5cc59..0fceed9a4152 100644 > > --- a/arch/x86/include/asm/fixmap.h > > +++ b/arch/x86/include/asm/fixmap.h > > @@ -108,6 +108,9 @@ enum fixed_addresses { > > #ifdef CONFIG_PARAVIRT_XXL > > FIX_PARAVIRT_BOOTMAP, > > #endif > > +#ifdef CONFIG_HIBERNATION > > + FIX_HIBERNATE, > > +#endif > > > > #ifdef CONFIG_ACPI_APEI_GHES > > /* Used for GHES mapping from assorted contexts */ > > diff --git a/arch/x86/power/hibernate.c > > b/arch/x86/power/hibernate.c > > index 6f955eb1e163..473b6b5f6b7e 100644 > > --- a/arch/x86/power/hibernate.c > > +++ b/arch/x86/power/hibernate.c > > @@ -147,6 +147,16 @@ int arch_hibernation_header_restore(void > > *addr) > > return 0; > > } > > > > +void copy_unmapped_page(void *dst, struct page *page) > > +{ > > + WARN_ON(!preempt_count()); > > I don't think the above is needed. The code using this function > cannot be preempted anyway AFAICS. The reason I thought it was useful was because this function is now defined in a header. Someone else might decide to use it. Does it seem more useful? > > > + > > + set_fixmap(FIX_HIBERNATE, page_to_phys(page)); > > + __flush_tlb_all(); > > So do TLBs need to be flushed before copying every single page? > Basically, they are all copied in one loop. It is only one fixmap entry so it needs to be flushed after changing the PTE to point to a different page. But this is only for the case of unmapped pages, the more common mapped pages are copied from the direct map like usual. > > > + hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE)); > > + clear_fixmap(FIX_HIBERNATE); > > +} > > + > > int relocate_restore_code(void) > > { > > pgd_t *pgd; > > diff --git a/include/linux/suspend.h b/include/linux/suspend.h > > index cfe19a028918..0b19b910526e 100644 > > --- a/include/linux/suspend.h > > +++ b/include/linux/suspend.h > > @@ -447,6 +447,19 @@ extern bool hibernation_available(void); > > asmlinkage int swsusp_save(void); > > extern struct pbe *restore_pblist; > > int pfn_is_nosave(unsigned long pfn); > > +void copy_unmapped_page(void *dst, struct page *page); > > + > > +/* > > + * This is needed, because copy_page and memcpy are not usable for > > copying > > + * task structs. > > + */ > > +static inline void hib_copy_page(long *dst, long *src) > > +{ > > + int n; > > + > > + for (n = PAGE_SIZE / sizeof(long); n; n--) > > + *dst++ = *src++; > > +} > > > > int hibernate_quiet_exec(int (*func)(void *data), void *data); > > #else /* CONFIG_HIBERNATION */ > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > > index cd8b7b35f1e8..344c071f29d3 100644 > > --- a/kernel/power/snapshot.c > > +++ b/kernel/power/snapshot.c > > @@ -1369,16 +1369,11 @@ static unsigned int count_data_pages(void) > > return n; > > } > > > > -/* > > - * This is needed, because copy_page and memcpy are not usable for > > copying > > - * task structs. > > - */ > > -static inline void do_copy_page(long *dst, long *src) > > +void __weak copy_unmapped_page(void *dst, struct page *page) > > { > > - int n; > > - > > - for (n = PAGE_SIZE / sizeof(long); n; n--) > > - *dst++ = *src++; > > + hibernate_map_page(page); > > + hib_copy_page(dst, page_address(page)); > > + hibernate_unmap_page(page); > > } > > > > /** > > @@ -1392,11 +1387,9 @@ static inline void do_copy_page(long *dst, > > long *src) > > static void safe_copy_page(void *dst, struct page *s_page) > > { > > if (kernel_page_present(s_page)) { > > - do_copy_page(dst, page_address(s_page)); > > + hib_copy_page(dst, page_address(s_page)); > > } else { > > - hibernate_map_page(s_page); > > - do_copy_page(dst, page_address(s_page)); > > - hibernate_unmap_page(s_page); > > + copy_unmapped_page(dst, s_page); > > } > > } > > > > @@ -1417,7 +1410,7 @@ static void copy_data_page(unsigned long > > dst_pfn, unsigned long src_pfn) > > if (PageHighMem(s_page)) { > > src = kmap_atomic(s_page); > > dst = kmap_atomic(d_page); > > - do_copy_page(dst, src); > > + hib_copy_page(dst, src); > > kunmap_atomic(dst); > > kunmap_atomic(src); > > } else { > > -- > > 2.17.1 > >
On Wed, Jan 11, 2023 at 9:46 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Wed, 2023-01-11 at 21:35 +0100, Rafael J. Wysocki wrote: > > On Tue, Jan 10, 2023 at 10:19 PM Rick Edgecombe > > <rick.p.edgecombe@intel.com> wrote: > > > > > > Hibernate uses the direct map to read memory it saves to disk. > > > Since > > > sometimes pages are not accessible on the direct map ("not present" > > > on > > > x86), it has special case logic to temporarily make a page present. > > > On x86 > > > these direct map addresses can be mapped at various page sizes, but > > > the > > > logic works ok as long as the not present pages are always mapped > > > as > > > PAGE_SIZE such that they don't require a split to map the region as > > > present. If the address was mapped not present by a larger page > > > size, the > > > split may fail and hibernate would then try to read an address > > > mapped not > > > present. > > > > > > Today on x86 there are no known cases of this (huge not present > > > pages on > > > the direct map), but it has come up from time to time when > > > developing > > > things that operate on the direct map. It blocked making > > > VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and > > > also > > > has been a complication for various direct map protection efforts. > > > > > > This dependency is also pretty hidden and easily missed by people > > > poking at > > > the direct map. For this reason, there are warnings in place to > > > complain > > > but not handle this scenario. > > > > > > One way to make this more robust would be to create some new CPA > > > functionality that can know to map and reset the whole huge page in > > > the > > > case of trying to map a subpage. But for simplicity and smaller > > > code, just > > > make x86 hibernate have its own fixmap PTE that it can use to point > > > to 4k pages when it encounters an unmapped direct map page. > > > > > > Move do_copy_page() to a header such that it can be used in an arch > > > breakout. Rename it hib_copy_page() to be more hibernate specific > > > since > > > it could appear in other files. > > > > > > Use __weak for the arch breakout because there is not a suitable > > > arch > > > specific header to use the #define method. > > > > > > Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> > > > --- > > > > > > Rebase to v6.2-rc3 (original still applied) and resending per: > > > > https://lore.kernel.org/lkml/CAJZ5v0i6cxGD+V6G+q-Y_Lp-ov51_zmkZr8ZGpCtqWV-e=BsLg@mail.gmail.com/ > > > > > > arch/x86/include/asm/fixmap.h | 3 +++ > > > arch/x86/power/hibernate.c | 10 ++++++++++ > > > include/linux/suspend.h | 13 +++++++++++++ > > > kernel/power/snapshot.c | 21 +++++++-------------- > > > 4 files changed, 33 insertions(+), 14 deletions(-) > > > > > > diff --git a/arch/x86/include/asm/fixmap.h > > > b/arch/x86/include/asm/fixmap.h > > > index d0dcefb5cc59..0fceed9a4152 100644 > > > --- a/arch/x86/include/asm/fixmap.h > > > +++ b/arch/x86/include/asm/fixmap.h > > > @@ -108,6 +108,9 @@ enum fixed_addresses { > > > #ifdef CONFIG_PARAVIRT_XXL > > > FIX_PARAVIRT_BOOTMAP, > > > #endif > > > +#ifdef CONFIG_HIBERNATION > > > + FIX_HIBERNATE, > > > +#endif > > > > > > #ifdef CONFIG_ACPI_APEI_GHES > > > /* Used for GHES mapping from assorted contexts */ > > > diff --git a/arch/x86/power/hibernate.c > > > b/arch/x86/power/hibernate.c > > > index 6f955eb1e163..473b6b5f6b7e 100644 > > > --- a/arch/x86/power/hibernate.c > > > +++ b/arch/x86/power/hibernate.c > > > @@ -147,6 +147,16 @@ int arch_hibernation_header_restore(void > > > *addr) > > > return 0; > > > } > > > > > > +void copy_unmapped_page(void *dst, struct page *page) > > > +{ > > > + WARN_ON(!preempt_count()); > > > > I don't think the above is needed. The code using this function > > cannot be preempted anyway AFAICS. > > The reason I thought it was useful was because this function is now > defined in a header. Someone else might decide to use it. Does it seem > more useful? Well, it is exposed now, but only in order to allow the __weak function to be overridden. I don't think it is logically valid to use it anywhere beyond its original call site. To make that clear, I would call it something hibernation-specific, like hibernate_copy_unmapped_page() and I would add a kerneldoc comment to it to describe its intended use. Furthermore, I'm not sure about the new code layout. Personally, I would prefer hibernate_map_page() and hibernate_unmap_page() to be turned into __weak functions and possibly overridden by the arch code, which would allow the amount of changes to be reduced and do_copy_page() wouldn't need to be moved into the header any more. > > > > > + > > > + set_fixmap(FIX_HIBERNATE, page_to_phys(page)); > > > + __flush_tlb_all(); > > > > So do TLBs need to be flushed before copying every single page? > > Basically, they are all copied in one loop. > > It is only one fixmap entry so it needs to be flushed after changing > the PTE to point to a different page. But this is only for the case of > unmapped pages, the more common mapped pages are copied from the direct > map like usual. OK > > > > > + hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE)); > > > + clear_fixmap(FIX_HIBERNATE); > > > +} > > > + > > > int relocate_restore_code(void) > > > { > > > pgd_t *pgd; > > > diff --git a/include/linux/suspend.h b/include/linux/suspend.h > > > index cfe19a028918..0b19b910526e 100644 > > > --- a/include/linux/suspend.h > > > +++ b/include/linux/suspend.h > > > @@ -447,6 +447,19 @@ extern bool hibernation_available(void); > > > asmlinkage int swsusp_save(void); > > > extern struct pbe *restore_pblist; > > > int pfn_is_nosave(unsigned long pfn); > > > +void copy_unmapped_page(void *dst, struct page *page); > > > + > > > +/* > > > + * This is needed, because copy_page and memcpy are not usable for > > > copying > > > + * task structs. > > > + */ > > > +static inline void hib_copy_page(long *dst, long *src) > > > +{ > > > + int n; > > > + > > > + for (n = PAGE_SIZE / sizeof(long); n; n--) > > > + *dst++ = *src++; > > > +} > > > > > > int hibernate_quiet_exec(int (*func)(void *data), void *data); > > > #else /* CONFIG_HIBERNATION */ > > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > > > index cd8b7b35f1e8..344c071f29d3 100644 > > > --- a/kernel/power/snapshot.c > > > +++ b/kernel/power/snapshot.c > > > @@ -1369,16 +1369,11 @@ static unsigned int count_data_pages(void) > > > return n; > > > } > > > > > > -/* > > > - * This is needed, because copy_page and memcpy are not usable for > > > copying > > > - * task structs. > > > - */ > > > -static inline void do_copy_page(long *dst, long *src) > > > +void __weak copy_unmapped_page(void *dst, struct page *page) > > > { > > > - int n; > > > - > > > - for (n = PAGE_SIZE / sizeof(long); n; n--) > > > - *dst++ = *src++; > > > + hibernate_map_page(page); > > > + hib_copy_page(dst, page_address(page)); > > > + hibernate_unmap_page(page); > > > } > > > > > > /** > > > @@ -1392,11 +1387,9 @@ static inline void do_copy_page(long *dst, > > > long *src) > > > static void safe_copy_page(void *dst, struct page *s_page) > > > { > > > if (kernel_page_present(s_page)) { > > > - do_copy_page(dst, page_address(s_page)); > > > + hib_copy_page(dst, page_address(s_page)); > > > } else { > > > - hibernate_map_page(s_page); > > > - do_copy_page(dst, page_address(s_page)); > > > - hibernate_unmap_page(s_page); > > > + copy_unmapped_page(dst, s_page); > > > } > > > } > > > > > > @@ -1417,7 +1410,7 @@ static void copy_data_page(unsigned long > > > dst_pfn, unsigned long src_pfn) > > > if (PageHighMem(s_page)) { > > > src = kmap_atomic(s_page); > > > dst = kmap_atomic(d_page); > > > - do_copy_page(dst, src); > > > + hib_copy_page(dst, src); > > > kunmap_atomic(dst); > > > kunmap_atomic(src); > > > } else { > > > -- > > > 2.17.1 > > >
On Thu, 2023-01-12 at 15:15 +0100, Rafael J. Wysocki wrote: > > > > > > I don't think the above is needed. The code using this function > > > cannot be preempted anyway AFAICS. > > > > The reason I thought it was useful was because this function is now > > defined in a header. Someone else might decide to use it. Does it > > seem > > more useful? > > Well, it is exposed now, but only in order to allow the __weak > function to be overridden. I don't think it is logically valid to > use > it anywhere beyond its original call site. > > To make that clear, I would call it something hibernation-specific, > like hibernate_copy_unmapped_page() and I would add a kerneldoc > comment to it to describe its intended use. Ok, I'll change the name, that makes sense. On the warning, ok, I'll drop it. But to me the code stand out as questionable with the PTE change and only the local TLB flush. It's a bit of a comment as code on a rare path. > > Furthermore, I'm not sure about the new code layout. > > Personally, I would prefer hibernate_map_page() and > hibernate_unmap_page() to be turned into __weak functions and > possibly > overridden by the arch code, which would allow the amount of changes > to be reduced and do_copy_page() wouldn't need to be moved into the > header any more. Currently hibernate_map_page() maps the page on the direct map and doesn't return anything. This new code effectively creates a readable alias in the fixmap. So it would have to return an address to use so the core hibernate code would know where to copy from. Then it would have to pass it back into hibernate_unmap_page() for the arch to decide what to do to clean it up. I think it would be more complicated. There is also already multiple paths in hibernate_map_page() that would have to be duplicated in the arch versions. So I see the idea, but I'm not sure it ends up better. Can we leave this one? Thanks, Rick
On Thu, Jan 12, 2023 at 8:15 PM Edgecombe, Rick P <rick.p.edgecombe@intel.com> wrote: > > On Thu, 2023-01-12 at 15:15 +0100, Rafael J. Wysocki wrote: > > > > > > > > I don't think the above is needed. The code using this function > > > > cannot be preempted anyway AFAICS. > > > > > > The reason I thought it was useful was because this function is now > > > defined in a header. Someone else might decide to use it. Does it > > > seem > > > more useful? > > > > Well, it is exposed now, but only in order to allow the __weak > > function to be overridden. I don't think it is logically valid to > > use > > it anywhere beyond its original call site. > > > > To make that clear, I would call it something hibernation-specific, > > like hibernate_copy_unmapped_page() and I would add a kerneldoc > > comment to it to describe its intended use. > > Ok, I'll change the name, that makes sense. > > On the warning, ok, I'll drop it. But to me the code stand out as > questionable with the PTE change and only the local TLB flush. It's a > bit of a comment as code on a rare path. > > > > > Furthermore, I'm not sure about the new code layout. > > > > Personally, I would prefer hibernate_map_page() and > > hibernate_unmap_page() to be turned into __weak functions and > > possibly > > overridden by the arch code, which would allow the amount of changes > > to be reduced and do_copy_page() wouldn't need to be moved into the > > header any more. > > Currently hibernate_map_page() maps the page on the direct map and > doesn't return anything. This new code effectively creates a readable > alias in the fixmap. So it would have to return an address to use so > the core hibernate code would know where to copy from. Then it would > have to pass it back into hibernate_unmap_page() for the arch to decide > what to do to clean it up. I think it would be more complicated. AFAICS, you only need hibernate_map_page() to return an address that will be passed to do_copy_page() (that doesn't need to be touched otherwise) and hibernate_unmap_page() would just do clear_fixmap(FIX_HIBERNATE); so it may as well take the page as the argument. > There is also already multiple paths in hibernate_map_page() that would > have to be duplicated in the arch versions. But only if the arch decides to override the originals. Now, the only caller of both hibernate_map_page() and hibernate_unmap_page() is safe_copy_page() in the "else" branch. Your current patch replaces that branch completely with an arch version, so they are not going to be invoked on x86 anyway. > So I see the idea, but I'm not sure it ends up better. Can we leave this one? I first need to be convinced that it is indeed better.
On Thu, 2023-01-12 at 20:37 +0100, Rafael J. Wysocki wrote: > > So I see the idea, but I'm not sure it ends up better. Can we leave > > this one? > > I first need to be convinced that it is indeed better. I'll give it a try. And good point about not needing to reproduce the functionality in the default hibernate_map_page(). I'm see that x86 doesn't need the other options.
diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h index d0dcefb5cc59..0fceed9a4152 100644 --- a/arch/x86/include/asm/fixmap.h +++ b/arch/x86/include/asm/fixmap.h @@ -108,6 +108,9 @@ enum fixed_addresses { #ifdef CONFIG_PARAVIRT_XXL FIX_PARAVIRT_BOOTMAP, #endif +#ifdef CONFIG_HIBERNATION + FIX_HIBERNATE, +#endif #ifdef CONFIG_ACPI_APEI_GHES /* Used for GHES mapping from assorted contexts */ diff --git a/arch/x86/power/hibernate.c b/arch/x86/power/hibernate.c index 6f955eb1e163..473b6b5f6b7e 100644 --- a/arch/x86/power/hibernate.c +++ b/arch/x86/power/hibernate.c @@ -147,6 +147,16 @@ int arch_hibernation_header_restore(void *addr) return 0; } +void copy_unmapped_page(void *dst, struct page *page) +{ + WARN_ON(!preempt_count()); + + set_fixmap(FIX_HIBERNATE, page_to_phys(page)); + __flush_tlb_all(); + hib_copy_page(dst, (void *)fix_to_virt(FIX_HIBERNATE)); + clear_fixmap(FIX_HIBERNATE); +} + int relocate_restore_code(void) { pgd_t *pgd; diff --git a/include/linux/suspend.h b/include/linux/suspend.h index cfe19a028918..0b19b910526e 100644 --- a/include/linux/suspend.h +++ b/include/linux/suspend.h @@ -447,6 +447,19 @@ extern bool hibernation_available(void); asmlinkage int swsusp_save(void); extern struct pbe *restore_pblist; int pfn_is_nosave(unsigned long pfn); +void copy_unmapped_page(void *dst, struct page *page); + +/* + * This is needed, because copy_page and memcpy are not usable for copying + * task structs. + */ +static inline void hib_copy_page(long *dst, long *src) +{ + int n; + + for (n = PAGE_SIZE / sizeof(long); n; n--) + *dst++ = *src++; +} int hibernate_quiet_exec(int (*func)(void *data), void *data); #else /* CONFIG_HIBERNATION */ diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index cd8b7b35f1e8..344c071f29d3 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1369,16 +1369,11 @@ static unsigned int count_data_pages(void) return n; } -/* - * This is needed, because copy_page and memcpy are not usable for copying - * task structs. - */ -static inline void do_copy_page(long *dst, long *src) +void __weak copy_unmapped_page(void *dst, struct page *page) { - int n; - - for (n = PAGE_SIZE / sizeof(long); n; n--) - *dst++ = *src++; + hibernate_map_page(page); + hib_copy_page(dst, page_address(page)); + hibernate_unmap_page(page); } /** @@ -1392,11 +1387,9 @@ static inline void do_copy_page(long *dst, long *src) static void safe_copy_page(void *dst, struct page *s_page) { if (kernel_page_present(s_page)) { - do_copy_page(dst, page_address(s_page)); + hib_copy_page(dst, page_address(s_page)); } else { - hibernate_map_page(s_page); - do_copy_page(dst, page_address(s_page)); - hibernate_unmap_page(s_page); + copy_unmapped_page(dst, s_page); } } @@ -1417,7 +1410,7 @@ static void copy_data_page(unsigned long dst_pfn, unsigned long src_pfn) if (PageHighMem(s_page)) { src = kmap_atomic(s_page); dst = kmap_atomic(d_page); - do_copy_page(dst, src); + hib_copy_page(dst, src); kunmap_atomic(dst); kunmap_atomic(src); } else {
Hibernate uses the direct map to read memory it saves to disk. Since sometimes pages are not accessible on the direct map ("not present" on x86), it has special case logic to temporarily make a page present. On x86 these direct map addresses can be mapped at various page sizes, but the logic works ok as long as the not present pages are always mapped as PAGE_SIZE such that they don't require a split to map the region as present. If the address was mapped not present by a larger page size, the split may fail and hibernate would then try to read an address mapped not present. Today on x86 there are no known cases of this (huge not present pages on the direct map), but it has come up from time to time when developing things that operate on the direct map. It blocked making VM_FLUSH_RESET_PERMS support huge vmalloc when that came up, and also has been a complication for various direct map protection efforts. This dependency is also pretty hidden and easily missed by people poking at the direct map. For this reason, there are warnings in place to complain but not handle this scenario. One way to make this more robust would be to create some new CPA functionality that can know to map and reset the whole huge page in the case of trying to map a subpage. But for simplicity and smaller code, just make x86 hibernate have its own fixmap PTE that it can use to point to 4k pages when it encounters an unmapped direct map page. Move do_copy_page() to a header such that it can be used in an arch breakout. Rename it hib_copy_page() to be more hibernate specific since it could appear in other files. Use __weak for the arch breakout because there is not a suitable arch specific header to use the #define method. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> --- Rebase to v6.2-rc3 (original still applied) and resending per: https://lore.kernel.org/lkml/CAJZ5v0i6cxGD+V6G+q-Y_Lp-ov51_zmkZr8ZGpCtqWV-e=BsLg@mail.gmail.com/ arch/x86/include/asm/fixmap.h | 3 +++ arch/x86/power/hibernate.c | 10 ++++++++++ include/linux/suspend.h | 13 +++++++++++++ kernel/power/snapshot.c | 21 +++++++-------------- 4 files changed, 33 insertions(+), 14 deletions(-)