Message ID | 20230929-hib_zero_bitmap_fix-v1-1-6cfdcb785250@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested, archived |
Headers | show |
Series | PM: hibernate: Fix a bug in copying the zero bitmap to safe pages | expand |
On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti <quic_pkondeti@quicinc.com> wrote: > Hi Pavankumar, > The following crash is observed 100% of the time during resume from > the hibernation on a x86 QEMU system. > > [ 12.931887] ? __die_body+0x1a/0x60 > [ 12.932324] ? page_fault_oops+0x156/0x420 > [ 12.932824] ? search_exception_tables+0x37/0x50 > [ 12.933389] ? fixup_exception+0x21/0x300 > [ 12.933889] ? exc_page_fault+0x69/0x150 > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > [ 12.937035] ? hib_submit_io+0xa5/0x110 > [ 12.937501] load_image+0x83/0x1a0 > [ 12.937919] swsusp_read+0x17f/0x1d0 > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > [ 12.938967] load_image_and_restore+0x45/0xc0 > [ 12.939494] software_resume+0x13c/0x180 > [ 12.939994] resume_store+0xa3/0x1d0 > > The commit being fixed introduced a bug in copying the zero bitmap > to safe pages. A temporary bitmap is allocated in prepare_image() > to make a copy of zero bitmap after the unsafe pages are marked. > Freeing this temporary bitmap later results in an inconsistent state > of unsafe pages. Since free bit is left as is for this temporary bitmap > after free, these pages are treated as unsafe pages when they are > allocated again. This results in incorrect calculation of the number > of pages pre-allocated for the image. > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > The allocate_unsafe_pages is estimated to be higher than the actual > which results in running short of pages in safe_pages_list. Hence the > crash is observed in get_buffer() due to NULL pointer access of > safe_pages_list. Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b which addresses the null pointer dereference which regardless shouldn't be touching the list directly and should be using __get_safe_page(). I'll review this patch in the next day or two. > > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> > --- > kernel/power/snapshot.c | 23 ++++++++++++++--------- > 1 file changed, 14 insertions(+), 9 deletions(-) > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index 87e9f7e2bdc0..cb7341a71a21 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > struct memory_bitmap *zero_bm) > { > unsigned int nr_pages, nr_highmem; > - struct memory_bitmap tmp; > + struct memory_bitmap tmp_zero_bm; > struct linked_page *lp; > int error; > > @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > free_image_page(buffer, PG_UNSAFE_CLEAR); > buffer = NULL; > > + /* > + * Allocate a temporary bitmap to create a copy of zero_bm in > + * safe pages. This allocation needs to be done before marking > + * unsafe pages below so that it can be freed without altering > + * the state of unsafe pages. > + */ > + error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY); > + if (error) > + goto Free; > + > nr_highmem = count_highmem_image_pages(bm); > mark_unsafe_pages(bm); > > @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > duplicate_memory_bitmap(new_bm, bm); > memory_bm_free(bm, PG_UNSAFE_KEEP); > > - /* Make a copy of zero_bm so it can be created in safe pages */ > - error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY); > - if (error) > - goto Free; > - > - duplicate_memory_bitmap(&tmp, zero_bm); > + duplicate_memory_bitmap(&tmp_zero_bm, zero_bm); > memory_bm_free(zero_bm, PG_UNSAFE_KEEP); > > /* Recreate zero_bm in safe pages */ > @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > if (error) > goto Free; > > - duplicate_memory_bitmap(zero_bm, &tmp); > - memory_bm_free(&tmp, PG_UNSAFE_KEEP); > + duplicate_memory_bitmap(zero_bm, &tmp_zero_bm); > + memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP); > /* At this point zero_bm is in safe pages and it can be used for restoring. */ > > if (nr_highmem > 0) { > > --- > base-commit: 6465e260f48790807eef06b583b38ca9789b6072 > change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae > > Best regards, > -- > Pavankumar Kondeti <quic_pkondeti@quicinc.com> Thanks, Brian >
On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti <quic_pkondeti@quicinc.com> wrote: > > The following crash is observed 100% of the time during resume from > the hibernation on a x86 QEMU system. > > [ 12.931887] ? __die_body+0x1a/0x60 > [ 12.932324] ? page_fault_oops+0x156/0x420 > [ 12.932824] ? search_exception_tables+0x37/0x50 > [ 12.933389] ? fixup_exception+0x21/0x300 > [ 12.933889] ? exc_page_fault+0x69/0x150 > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > [ 12.937035] ? hib_submit_io+0xa5/0x110 > [ 12.937501] load_image+0x83/0x1a0 > [ 12.937919] swsusp_read+0x17f/0x1d0 > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > [ 12.938967] load_image_and_restore+0x45/0xc0 > [ 12.939494] software_resume+0x13c/0x180 > [ 12.939994] resume_store+0xa3/0x1d0 > > The commit being fixed introduced a bug in copying the zero bitmap > to safe pages. A temporary bitmap is allocated in prepare_image() > to make a copy of zero bitmap after the unsafe pages are marked. > Freeing this temporary bitmap later results in an inconsistent state > of unsafe pages. Since free bit is left as is for this temporary bitmap > after free, these pages are treated as unsafe pages when they are > allocated again. This results in incorrect calculation of the number > of pages pre-allocated for the image. > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > The allocate_unsafe_pages is estimated to be higher than the actual > which results in running short of pages in safe_pages_list. Hence the > crash is observed in get_buffer() due to NULL pointer access of > safe_pages_list. After reading through the code, perhaps I'm missing something, I'm not sure that this is really fixing the problem. It seems like the problem would be that memory_bm_create() results in calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that get_image_page() will not touch allocated_unsafe_pages and instead will mark the page as in use by setting it in the forbidden_pages_map and the free_pages_map simultaneously. The problem is that the free_pages_map was already populated by the call to mark_unsafe_pages, meaning that if we allocated a safe page in get_image_page() we just set the free bit when it otherwise should not be set. When the page is later free'd via the call to memory_bm_free(&tmp, PG_UNSAFE_KEEP), it results in calls to free_image_page() w/ clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not touch the free_pages_map because we don't call swsusp_unset_page_free(). With all that being said it seems like the correct way to deal with that would be to do: error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); Here we know that the pages were not in the free_pages_map initially. Followed by freeing it as: memory_bm_free(&tmp, PG_UNSAFE_CLEAR); And here we know that swsusp_unset_page_free() will be called making sure the page is not in the free_pages_map afterwards. And that should result in an unchanged free_pages_map. Does that make sense? Please correct me if I'm misunderstanding something. > > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> > --- > kernel/power/snapshot.c | 23 ++++++++++++++--------- > 1 file changed, 14 insertions(+), 9 deletions(-) > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > index 87e9f7e2bdc0..cb7341a71a21 100644 > --- a/kernel/power/snapshot.c > +++ b/kernel/power/snapshot.c > @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > struct memory_bitmap *zero_bm) > { > unsigned int nr_pages, nr_highmem; > - struct memory_bitmap tmp; > + struct memory_bitmap tmp_zero_bm; > struct linked_page *lp; > int error; > > @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > free_image_page(buffer, PG_UNSAFE_CLEAR); > buffer = NULL; > > + /* > + * Allocate a temporary bitmap to create a copy of zero_bm in > + * safe pages. This allocation needs to be done before marking > + * unsafe pages below so that it can be freed without altering > + * the state of unsafe pages. > + */ > + error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY); > + if (error) > + goto Free; > + > nr_highmem = count_highmem_image_pages(bm); > mark_unsafe_pages(bm); > > @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > duplicate_memory_bitmap(new_bm, bm); > memory_bm_free(bm, PG_UNSAFE_KEEP); > > - /* Make a copy of zero_bm so it can be created in safe pages */ > - error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY); > - if (error) > - goto Free; > - > - duplicate_memory_bitmap(&tmp, zero_bm); > + duplicate_memory_bitmap(&tmp_zero_bm, zero_bm); > memory_bm_free(zero_bm, PG_UNSAFE_KEEP); > > /* Recreate zero_bm in safe pages */ > @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > if (error) > goto Free; > > - duplicate_memory_bitmap(zero_bm, &tmp); > - memory_bm_free(&tmp, PG_UNSAFE_KEEP); > + duplicate_memory_bitmap(zero_bm, &tmp_zero_bm); > + memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP); > /* At this point zero_bm is in safe pages and it can be used for restoring. */ > > if (nr_highmem > 0) { > > --- > base-commit: 6465e260f48790807eef06b583b38ca9789b6072 > change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae > > Best regards, > -- > Pavankumar Kondeti <quic_pkondeti@quicinc.com> >
On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote: > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > <quic_pkondeti@quicinc.com> wrote: > > > > The following crash is observed 100% of the time during resume from > > the hibernation on a x86 QEMU system. > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > [ 12.933389] ? fixup_exception+0x21/0x300 > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > [ 12.937501] load_image+0x83/0x1a0 > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > [ 12.939494] software_resume+0x13c/0x180 > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > The commit being fixed introduced a bug in copying the zero bitmap > > to safe pages. A temporary bitmap is allocated in prepare_image() > > to make a copy of zero bitmap after the unsafe pages are marked. > > Freeing this temporary bitmap later results in an inconsistent state > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > after free, these pages are treated as unsafe pages when they are > > allocated again. This results in incorrect calculation of the number > > of pages pre-allocated for the image. > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > which results in running short of pages in safe_pages_list. Hence the > > crash is observed in get_buffer() due to NULL pointer access of > > safe_pages_list. > > After reading through the code, perhaps I'm missing something, I'm not > sure that this is really fixing the problem. > > It seems like the problem would be that memory_bm_create() results in > calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that > get_image_page() will not touch allocated_unsafe_pages and instead > will mark the page as in use by setting it in the forbidden_pages_map > and the free_pages_map simultaneously. The problem is that the > free_pages_map was already populated by the call to mark_unsafe_pages, > meaning that if we allocated a safe page in get_image_page() we just > set the free bit when it otherwise should not be set. > > When the page is later free'd via the call to memory_bm_free(&tmp, > PG_UNSAFE_KEEP), it results in calls to free_image_page() w/ > clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not > touch the free_pages_map because we don't call > swsusp_unset_page_free(). > > With all that being said it seems like the correct way to deal with > that would be to do: > error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); > Here we know that the pages were not in the free_pages_map initially. > > Followed by freeing it as: > memory_bm_free(&tmp, PG_UNSAFE_CLEAR); > And here we know that swsusp_unset_page_free() will be called making > sure the page is not in the free_pages_map afterwards. > > And that should result in an unchanged free_pages_map. Does that make > sense? Please correct me if I'm misunderstanding something. > To restate this another way, if I'm reading it correctly, I think the outcome is actually nearly the same, the difference is, when allocating the bitmap before w/ PG_ANY we're setting bits in the free_page_list which will be unset a few lines later in the call to mark_unsafe_pages(), and then we won't touch the free_pages_list during the memory_bm_free() because it's called with PG_UNSAFE_KEEP. > > > > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") > > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> > > --- > > kernel/power/snapshot.c | 23 ++++++++++++++--------- > > 1 file changed, 14 insertions(+), 9 deletions(-) > > > > diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c > > index 87e9f7e2bdc0..cb7341a71a21 100644 > > --- a/kernel/power/snapshot.c > > +++ b/kernel/power/snapshot.c > > @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > > struct memory_bitmap *zero_bm) > > { > > unsigned int nr_pages, nr_highmem; > > - struct memory_bitmap tmp; > > + struct memory_bitmap tmp_zero_bm; > > struct linked_page *lp; > > int error; > > > > @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > > free_image_page(buffer, PG_UNSAFE_CLEAR); > > buffer = NULL; > > > > + /* > > + * Allocate a temporary bitmap to create a copy of zero_bm in > > + * safe pages. This allocation needs to be done before marking > > + * unsafe pages below so that it can be freed without altering > > + * the state of unsafe pages. > > + */ > > + error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY); > > + if (error) > > + goto Free; > > + > > nr_highmem = count_highmem_image_pages(bm); > > mark_unsafe_pages(bm); > > > > @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > > duplicate_memory_bitmap(new_bm, bm); > > memory_bm_free(bm, PG_UNSAFE_KEEP); > > > > - /* Make a copy of zero_bm so it can be created in safe pages */ > > - error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY); > > - if (error) > > - goto Free; > > - > > - duplicate_memory_bitmap(&tmp, zero_bm); > > + duplicate_memory_bitmap(&tmp_zero_bm, zero_bm); > > memory_bm_free(zero_bm, PG_UNSAFE_KEEP); > > > > /* Recreate zero_bm in safe pages */ > > @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, > > if (error) > > goto Free; > > > > - duplicate_memory_bitmap(zero_bm, &tmp); > > - memory_bm_free(&tmp, PG_UNSAFE_KEEP); > > + duplicate_memory_bitmap(zero_bm, &tmp_zero_bm); > > + memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP); > > /* At this point zero_bm is in safe pages and it can be used for restoring. */ > > > > if (nr_highmem > 0) { > > > > --- > > base-commit: 6465e260f48790807eef06b583b38ca9789b6072 > > change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae > > > > Best regards, > > -- > > Pavankumar Kondeti <quic_pkondeti@quicinc.com> > >
On Mon, Oct 02, 2023 at 02:34:20PM -0400, Brian Geffon wrote: > On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote: > > > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > > <quic_pkondeti@quicinc.com> wrote: > > > > > > The following crash is observed 100% of the time during resume from > > > the hibernation on a x86 QEMU system. > > > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > > [ 12.933389] ? fixup_exception+0x21/0x300 > > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > > [ 12.937501] load_image+0x83/0x1a0 > > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > > [ 12.939494] software_resume+0x13c/0x180 > > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > > > The commit being fixed introduced a bug in copying the zero bitmap > > > to safe pages. A temporary bitmap is allocated in prepare_image() > > > to make a copy of zero bitmap after the unsafe pages are marked. > > > Freeing this temporary bitmap later results in an inconsistent state > > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > > after free, these pages are treated as unsafe pages when they are > > > allocated again. This results in incorrect calculation of the number > > > of pages pre-allocated for the image. > > > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > > which results in running short of pages in safe_pages_list. Hence the > > > crash is observed in get_buffer() due to NULL pointer access of > > > safe_pages_list. > > > > After reading through the code, perhaps I'm missing something, I'm not > > sure that this is really fixing the problem. > > > > It seems like the problem would be that memory_bm_create() results in > > calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that > > get_image_page() will not touch allocated_unsafe_pages and instead > > will mark the page as in use by setting it in the forbidden_pages_map > > and the free_pages_map simultaneously. The problem is that the > > free_pages_map was already populated by the call to mark_unsafe_pages, > > meaning that if we allocated a safe page in get_image_page() we just > > set the free bit when it otherwise should not be set. > > > > When the page is later free'd via the call to memory_bm_free(&tmp, > > PG_UNSAFE_KEEP), it results in calls to free_image_page() w/ > > clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not > > touch the free_pages_map because we don't call > > swsusp_unset_page_free(). > > > > With all that being said it seems like the correct way to deal with > > that would be to do: > > error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); > > Here we know that the pages were not in the free_pages_map initially. > > > > Followed by freeing it as: > > memory_bm_free(&tmp, PG_UNSAFE_CLEAR); > > And here we know that swsusp_unset_page_free() will be called making > > sure the page is not in the free_pages_map afterwards. > > > > And that should result in an unchanged free_pages_map. Does that make > > sense? Please correct me if I'm misunderstanding something. > > Thanks for your review. Appreciate the detailed summary. > > To restate this another way, if I'm reading it correctly, I think the > outcome is actually nearly the same, the difference is, when > allocating the bitmap before w/ PG_ANY we're setting bits in the > free_page_list which will be unset a few lines later in the call to > mark_unsafe_pages(), and then we won't touch the free_pages_list > during the memory_bm_free() because it's called with PG_UNSAFE_KEEP. > The current patch and your suggestion both gives the same effect like you said. Since it is a temporary buffer to hold the zero bit map page, I did not allocate safe pages. Allocating safe pages means a bit more work. In this case this it is not completely throw away work but re-ordering the call seems to be simple here. Pls let me know if you want to me incorporate your suggestion. Thanks, Pavan
On Sat, Sep 30, 2023 at 07:37:13AM -0400, Brian Geffon wrote: > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > <quic_pkondeti@quicinc.com> wrote: > > > Hi Pavankumar, > > > The following crash is observed 100% of the time during resume from > > the hibernation on a x86 QEMU system. > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > [ 12.933389] ? fixup_exception+0x21/0x300 > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > [ 12.937501] load_image+0x83/0x1a0 > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > [ 12.939494] software_resume+0x13c/0x180 > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > The commit being fixed introduced a bug in copying the zero bitmap > > to safe pages. A temporary bitmap is allocated in prepare_image() > > to make a copy of zero bitmap after the unsafe pages are marked. > > Freeing this temporary bitmap later results in an inconsistent state > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > after free, these pages are treated as unsafe pages when they are > > allocated again. This results in incorrect calculation of the number > > of pages pre-allocated for the image. > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > which results in running short of pages in safe_pages_list. Hence the > > crash is observed in get_buffer() due to NULL pointer access of > > safe_pages_list. > > Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b > which addresses the null pointer dereference which regardless > shouldn't be touching the list directly and should be using > __get_safe_page(). Thanks for pointing me to this. I have verified hibernation by pulling this commit to v6.6-rc3 and it works as expected. This commit is currently queued for v6.7, can it be included in next -rc or we have to apply the patch I have sent to make sure that hibernation works on v6.6 when it gets released. > > > > > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") > > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> Thanks, Pavan
On Mon, Oct 2, 2023 at 11:08 PM Pavan Kondeti <quic_pkondeti@quicinc.com> wrote: > > On Sat, Sep 30, 2023 at 07:37:13AM -0400, Brian Geffon wrote: > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > > <quic_pkondeti@quicinc.com> wrote: > > > > > Hi Pavankumar, > > > > > The following crash is observed 100% of the time during resume from > > > the hibernation on a x86 QEMU system. > > > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > > [ 12.933389] ? fixup_exception+0x21/0x300 > > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > > [ 12.937501] load_image+0x83/0x1a0 > > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > > [ 12.939494] software_resume+0x13c/0x180 > > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > > > The commit being fixed introduced a bug in copying the zero bitmap > > > to safe pages. A temporary bitmap is allocated in prepare_image() > > > to make a copy of zero bitmap after the unsafe pages are marked. > > > Freeing this temporary bitmap later results in an inconsistent state > > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > > after free, these pages are treated as unsafe pages when they are > > > allocated again. This results in incorrect calculation of the number > > > of pages pre-allocated for the image. > > > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > > which results in running short of pages in safe_pages_list. Hence the > > > crash is observed in get_buffer() due to NULL pointer access of > > > safe_pages_list. > > > > Rafael pulled https://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git/commit/?h=linux-next&id=f0c7183008b41e92fa676406d87f18773724b48b > > which addresses the null pointer dereference which regardless > > shouldn't be touching the list directly and should be using > > __get_safe_page(). > > Thanks for pointing me to this. I have verified hibernation by pulling this > commit to v6.6-rc3 and it works as expected. > > This commit is currently queued for v6.7, can it be included in next -rc or > we have to apply the patch I have sent to make sure that hibernation works on > v6.6 when it gets released. It was CCed to stable, so it will be included in stable kernels. > > > > > > > > > Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") > > > Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> > > Thanks, > Pavan
On Mon, Oct 2, 2023 at 11:05 PM Pavan Kondeti <quic_pkondeti@quicinc.com> wrote: > > On Mon, Oct 02, 2023 at 02:34:20PM -0400, Brian Geffon wrote: > > On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote: > > > > > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > > > <quic_pkondeti@quicinc.com> wrote: > > > > > > > > The following crash is observed 100% of the time during resume from > > > > the hibernation on a x86 QEMU system. > > > > > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > > > [ 12.933389] ? fixup_exception+0x21/0x300 > > > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > > > [ 12.937501] load_image+0x83/0x1a0 > > > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > > > [ 12.939494] software_resume+0x13c/0x180 > > > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > > > > > The commit being fixed introduced a bug in copying the zero bitmap > > > > to safe pages. A temporary bitmap is allocated in prepare_image() > > > > to make a copy of zero bitmap after the unsafe pages are marked. > > > > Freeing this temporary bitmap later results in an inconsistent state > > > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > > > after free, these pages are treated as unsafe pages when they are > > > > allocated again. This results in incorrect calculation of the number > > > > of pages pre-allocated for the image. > > > > > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > > > which results in running short of pages in safe_pages_list. Hence the > > > > crash is observed in get_buffer() due to NULL pointer access of > > > > safe_pages_list. > > > > > > After reading through the code, perhaps I'm missing something, I'm not > > > sure that this is really fixing the problem. > > > > > > It seems like the problem would be that memory_bm_create() results in > > > calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that > > > get_image_page() will not touch allocated_unsafe_pages and instead > > > will mark the page as in use by setting it in the forbidden_pages_map > > > and the free_pages_map simultaneously. The problem is that the > > > free_pages_map was already populated by the call to mark_unsafe_pages, > > > meaning that if we allocated a safe page in get_image_page() we just > > > set the free bit when it otherwise should not be set. > > > > > > When the page is later free'd via the call to memory_bm_free(&tmp, > > > PG_UNSAFE_KEEP), it results in calls to free_image_page() w/ > > > clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not > > > touch the free_pages_map because we don't call > > > swsusp_unset_page_free(). > > > > > > With all that being said it seems like the correct way to deal with > > > that would be to do: > > > error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); > > > Here we know that the pages were not in the free_pages_map initially. > > > > > > Followed by freeing it as: > > > memory_bm_free(&tmp, PG_UNSAFE_CLEAR); > > > And here we know that swsusp_unset_page_free() will be called making > > > sure the page is not in the free_pages_map afterwards. > > > > > > And that should result in an unchanged free_pages_map. Does that make > > > sense? Please correct me if I'm misunderstanding something. > > > > > Thanks for your review. Appreciate the detailed summary. > > > > > To restate this another way, if I'm reading it correctly, I think the > > outcome is actually nearly the same, the difference is, when > > allocating the bitmap before w/ PG_ANY we're setting bits in the > > free_page_list which will be unset a few lines later in the call to > > mark_unsafe_pages(), and then we won't touch the free_pages_list > > during the memory_bm_free() because it's called with PG_UNSAFE_KEEP. > > > > The current patch and your suggestion both gives the same effect like > you said. Since it is a temporary buffer to hold the zero bit map page, I > did not allocate safe pages. Allocating safe pages means a bit more > work. In this case this it is not completely throw away work but > re-ordering the call seems to be simple here. Pls let me know if you > want to me incorporate your suggestion. My personal opinion is that PG_SAFE makes a bit more sense, it's not really wasted work as any pages which are not safe end up being added to the allocated_unsafe_pages pool. > > Thanks, > Pavan
On Tue, Oct 03, 2023 at 10:36:25AM -0400, Brian Geffon wrote: > On Mon, Oct 2, 2023 at 11:05 PM Pavan Kondeti <quic_pkondeti@quicinc.com> wrote: > > > > On Mon, Oct 02, 2023 at 02:34:20PM -0400, Brian Geffon wrote: > > > On Mon, Oct 2, 2023 at 1:56 PM Brian Geffon <bgeffon@google.com> wrote: > > > > > > > > On Fri, Sep 29, 2023 at 1:31 PM Pavankumar Kondeti > > > > <quic_pkondeti@quicinc.com> wrote: > > > > > > > > > > The following crash is observed 100% of the time during resume from > > > > > the hibernation on a x86 QEMU system. > > > > > > > > > > [ 12.931887] ? __die_body+0x1a/0x60 > > > > > [ 12.932324] ? page_fault_oops+0x156/0x420 > > > > > [ 12.932824] ? search_exception_tables+0x37/0x50 > > > > > [ 12.933389] ? fixup_exception+0x21/0x300 > > > > > [ 12.933889] ? exc_page_fault+0x69/0x150 > > > > > [ 12.934371] ? asm_exc_page_fault+0x26/0x30 > > > > > [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 > > > > > [ 12.935428] snapshot_write_next+0x7c/0x9f0 > > > > > [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 > > > > > [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 > > > > > [ 12.937035] ? hib_submit_io+0xa5/0x110 > > > > > [ 12.937501] load_image+0x83/0x1a0 > > > > > [ 12.937919] swsusp_read+0x17f/0x1d0 > > > > > [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 > > > > > [ 12.938967] load_image_and_restore+0x45/0xc0 > > > > > [ 12.939494] software_resume+0x13c/0x180 > > > > > [ 12.939994] resume_store+0xa3/0x1d0 > > > > > > > > > > The commit being fixed introduced a bug in copying the zero bitmap > > > > > to safe pages. A temporary bitmap is allocated in prepare_image() > > > > > to make a copy of zero bitmap after the unsafe pages are marked. > > > > > Freeing this temporary bitmap later results in an inconsistent state > > > > > of unsafe pages. Since free bit is left as is for this temporary bitmap > > > > > after free, these pages are treated as unsafe pages when they are > > > > > allocated again. This results in incorrect calculation of the number > > > > > of pages pre-allocated for the image. > > > > > > > > > > nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; > > > > > > > > > > The allocate_unsafe_pages is estimated to be higher than the actual > > > > > which results in running short of pages in safe_pages_list. Hence the > > > > > crash is observed in get_buffer() due to NULL pointer access of > > > > > safe_pages_list. > > > > > > > > After reading through the code, perhaps I'm missing something, I'm not > > > > sure that this is really fixing the problem. > > > > > > > > It seems like the problem would be that memory_bm_create() results in > > > > calls to get_image_page() w/ safe_needed = PG_ANY == 0, meaning that > > > > get_image_page() will not touch allocated_unsafe_pages and instead > > > > will mark the page as in use by setting it in the forbidden_pages_map > > > > and the free_pages_map simultaneously. The problem is that the > > > > free_pages_map was already populated by the call to mark_unsafe_pages, > > > > meaning that if we allocated a safe page in get_image_page() we just > > > > set the free bit when it otherwise should not be set. > > > > > > > > When the page is later free'd via the call to memory_bm_free(&tmp, > > > > PG_UNSAFE_KEEP), it results in calls to free_image_page() w/ > > > > clear_page_nosave = PG_UNSAFE_KEEP == 0. This means that we do not > > > > touch the free_pages_map because we don't call > > > > swsusp_unset_page_free(). > > > > > > > > With all that being said it seems like the correct way to deal with > > > > that would be to do: > > > > error = memory_bm_create(&tmp, GFP_ATOMIC, PG_SAFE); > > > > Here we know that the pages were not in the free_pages_map initially. > > > > > > > > Followed by freeing it as: > > > > memory_bm_free(&tmp, PG_UNSAFE_CLEAR); > > > > And here we know that swsusp_unset_page_free() will be called making > > > > sure the page is not in the free_pages_map afterwards. > > > > > > > > And that should result in an unchanged free_pages_map. Does that make > > > > sense? Please correct me if I'm misunderstanding something. > > > > > > > > Thanks for your review. Appreciate the detailed summary. > > > > > > > > To restate this another way, if I'm reading it correctly, I think the > > > outcome is actually nearly the same, the difference is, when > > > allocating the bitmap before w/ PG_ANY we're setting bits in the > > > free_page_list which will be unset a few lines later in the call to > > > mark_unsafe_pages(), and then we won't touch the free_pages_list > > > during the memory_bm_free() because it's called with PG_UNSAFE_KEEP. > > > > > > > The current patch and your suggestion both gives the same effect like > > you said. Since it is a temporary buffer to hold the zero bit map page, I > > did not allocate safe pages. Allocating safe pages means a bit more > > work. In this case this it is not completely throw away work but > > re-ordering the call seems to be simple here. Pls let me know if you > > want to me incorporate your suggestion. > > My personal opinion is that PG_SAFE makes a bit more sense, it's not > really wasted work as any pages which are not safe end up being added > to the allocated_unsafe_pages pool. > Yes, the extra bit of works does not go waste. I will send v2 with your suggestion. Thanks a lot for your review and detailed comments. Thanks, Pavan
diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index 87e9f7e2bdc0..cb7341a71a21 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -2628,7 +2628,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, struct memory_bitmap *zero_bm) { unsigned int nr_pages, nr_highmem; - struct memory_bitmap tmp; + struct memory_bitmap tmp_zero_bm; struct linked_page *lp; int error; @@ -2636,6 +2636,16 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, free_image_page(buffer, PG_UNSAFE_CLEAR); buffer = NULL; + /* + * Allocate a temporary bitmap to create a copy of zero_bm in + * safe pages. This allocation needs to be done before marking + * unsafe pages below so that it can be freed without altering + * the state of unsafe pages. + */ + error = memory_bm_create(&tmp_zero_bm, GFP_ATOMIC, PG_ANY); + if (error) + goto Free; + nr_highmem = count_highmem_image_pages(bm); mark_unsafe_pages(bm); @@ -2646,12 +2656,7 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, duplicate_memory_bitmap(new_bm, bm); memory_bm_free(bm, PG_UNSAFE_KEEP); - /* Make a copy of zero_bm so it can be created in safe pages */ - error = memory_bm_create(&tmp, GFP_ATOMIC, PG_ANY); - if (error) - goto Free; - - duplicate_memory_bitmap(&tmp, zero_bm); + duplicate_memory_bitmap(&tmp_zero_bm, zero_bm); memory_bm_free(zero_bm, PG_UNSAFE_KEEP); /* Recreate zero_bm in safe pages */ @@ -2659,8 +2664,8 @@ static int prepare_image(struct memory_bitmap *new_bm, struct memory_bitmap *bm, if (error) goto Free; - duplicate_memory_bitmap(zero_bm, &tmp); - memory_bm_free(&tmp, PG_UNSAFE_KEEP); + duplicate_memory_bitmap(zero_bm, &tmp_zero_bm); + memory_bm_free(&tmp_zero_bm, PG_UNSAFE_KEEP); /* At this point zero_bm is in safe pages and it can be used for restoring. */ if (nr_highmem > 0) {
The following crash is observed 100% of the time during resume from the hibernation on a x86 QEMU system. [ 12.931887] ? __die_body+0x1a/0x60 [ 12.932324] ? page_fault_oops+0x156/0x420 [ 12.932824] ? search_exception_tables+0x37/0x50 [ 12.933389] ? fixup_exception+0x21/0x300 [ 12.933889] ? exc_page_fault+0x69/0x150 [ 12.934371] ? asm_exc_page_fault+0x26/0x30 [ 12.934869] ? get_buffer.constprop.0+0xac/0x100 [ 12.935428] snapshot_write_next+0x7c/0x9f0 [ 12.935929] ? submit_bio_noacct_nocheck+0x2c2/0x370 [ 12.936530] ? submit_bio_noacct+0x44/0x2c0 [ 12.937035] ? hib_submit_io+0xa5/0x110 [ 12.937501] load_image+0x83/0x1a0 [ 12.937919] swsusp_read+0x17f/0x1d0 [ 12.938355] ? create_basic_memory_bitmaps+0x1b7/0x240 [ 12.938967] load_image_and_restore+0x45/0xc0 [ 12.939494] software_resume+0x13c/0x180 [ 12.939994] resume_store+0xa3/0x1d0 The commit being fixed introduced a bug in copying the zero bitmap to safe pages. A temporary bitmap is allocated in prepare_image() to make a copy of zero bitmap after the unsafe pages are marked. Freeing this temporary bitmap later results in an inconsistent state of unsafe pages. Since free bit is left as is for this temporary bitmap after free, these pages are treated as unsafe pages when they are allocated again. This results in incorrect calculation of the number of pages pre-allocated for the image. nr_pages = (nr_zero_pages + nr_copy_pages) - nr_highmem - allocated_unsafe_pages; The allocate_unsafe_pages is estimated to be higher than the actual which results in running short of pages in safe_pages_list. Hence the crash is observed in get_buffer() due to NULL pointer access of safe_pages_list. Fixes: 005e8dddd497 ("PM: hibernate: don't store zero pages in the image file") Signed-off-by: Pavankumar Kondeti <quic_pkondeti@quicinc.com> --- kernel/power/snapshot.c | 23 ++++++++++++++--------- 1 file changed, 14 insertions(+), 9 deletions(-) --- base-commit: 6465e260f48790807eef06b583b38ca9789b6072 change-id: 20230929-hib_zero_bitmap_fix-bc5884eba0ae Best regards,