Message ID | 20210223215544.313871-1-mike.kravetz@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | hugetlb: fix uninitialized subpool pointer | expand |
On Tue, Feb 23, 2021 at 01:55:44PM -0800, Mike Kravetz wrote: > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() > with linux-next 5.12.0-20210222. > Call trace: > hugepage_subpool_put_pages.part.0+0x2c/0x138 > __free_huge_page+0xce/0x310 > alloc_pool_huge_page+0x102/0x120 > set_max_huge_pages+0x13e/0x350 > hugetlb_sysctl_handler_common+0xd8/0x110 > hugetlb_sysctl_handler+0x48/0x58 > proc_sys_call_handler+0x138/0x238 > new_sync_write+0x10e/0x198 > vfs_write.part.0+0x12c/0x238 > ksys_write+0x68/0xf8 > do_syscall+0x82/0xd0 > __do_syscall+0xb4/0xc8 > system_call+0x72/0x98 > > This is a result of the change which moved the hugetlb page subpool > pointer from page->private to page[1]->private. When new pages are > allocated from the buddy allocator, the private field of the head > page will be cleared, but the private field of subpages is not modified. > Therefore, old values may remain. > > Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). > > Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") > Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Do we need the hugetlb_set_page_subpool() in __free_huge_page? Reviewed-by: Oscar Salvador <osalvador@suse.de>
On 2/23/21 2:45 PM, Oscar Salvador wrote: > On Tue, Feb 23, 2021 at 01:55:44PM -0800, Mike Kravetz wrote: >> Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() >> with linux-next 5.12.0-20210222. >> Call trace: >> hugepage_subpool_put_pages.part.0+0x2c/0x138 >> __free_huge_page+0xce/0x310 >> alloc_pool_huge_page+0x102/0x120 >> set_max_huge_pages+0x13e/0x350 >> hugetlb_sysctl_handler_common+0xd8/0x110 >> hugetlb_sysctl_handler+0x48/0x58 >> proc_sys_call_handler+0x138/0x238 >> new_sync_write+0x10e/0x198 >> vfs_write.part.0+0x12c/0x238 >> ksys_write+0x68/0xf8 >> do_syscall+0x82/0xd0 >> __do_syscall+0xb4/0xc8 >> system_call+0x72/0x98 >> >> This is a result of the change which moved the hugetlb page subpool >> pointer from page->private to page[1]->private. When new pages are >> allocated from the buddy allocator, the private field of the head >> page will be cleared, but the private field of subpages is not modified. >> Therefore, old values may remain. >> >> Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). >> >> Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") >> Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> >> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > Do we need the hugetlb_set_page_subpool() in __free_huge_page? Yes, that is the more common case where the once active hugetlb page will be simply added to the free list via enqueue_huge_page(). This path does not go through prep_new_huge_page.
On 2021-02-23 23:55, Mike Kravetz wrote: > Yes, that is the more common case where the once active hugetlb page > will be simply added to the free list via enqueue_huge_page(). This > path does not go through prep_new_huge_page. Right, I see. Thanks
On 2/23/21 2:58 PM, Oscar Salvador wrote: > On 2021-02-23 23:55, Mike Kravetz wrote: >> Yes, that is the more common case where the once active hugetlb page >> will be simply added to the free list via enqueue_huge_page(). This >> path does not go through prep_new_huge_page. > > Right, I see. > > Thanks You got me thinking ... When we dynamically allocate gigantic pages via alloc_contig_pages, we will not use the buddy allocator. Therefore, the usual 'page prepping' will not take place. Specifically, I could not find anything in that path which clears page->private of the head page. Am I missing that somewhere? If not, then we need to clear that as well in prep_compound_gigantic_page. Or, just clear it in prep_new_huge_page to handle any change in assumptions about the buddy allocator. This is not something introduced with the recent field shuffling, it looks like something that existed for some time.
On 2/23/21 3:21 PM, Mike Kravetz wrote: > On 2/23/21 2:58 PM, Oscar Salvador wrote: >> On 2021-02-23 23:55, Mike Kravetz wrote: >>> Yes, that is the more common case where the once active hugetlb page >>> will be simply added to the free list via enqueue_huge_page(). This >>> path does not go through prep_new_huge_page. >> >> Right, I see. >> >> Thanks > > You got me thinking ... > When we dynamically allocate gigantic pages via alloc_contig_pages, we > will not use the buddy allocator. Therefore, the usual 'page prepping' > will not take place. Specifically, I could not find anything in that > path which clears page->private of the head page. > Am I missing that somewhere? If not, then we need to clear that as well > in prep_compound_gigantic_page. Or, just clear it in prep_new_huge_page > to handle any change in assumptions about the buddy allocator. > > This is not something introduced with the recent field shuffling, it > looks like something that existed for some time. nm, we do end up calling the same page prepping code (post_alloc_hook) from alloc_contig_range->isolate_freepages_range. Just to make sure, I purpously dirtied page->private of every page as it was being freed. Gigantic page allocation was just fine, and I even ran ltp mm tests with this dirtying in place.
On Wed, Feb 24, 2021 at 5:56 AM Mike Kravetz <mike.kravetz@oracle.com> wrote: > > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() > with linux-next 5.12.0-20210222. > Call trace: > hugepage_subpool_put_pages.part.0+0x2c/0x138 > __free_huge_page+0xce/0x310 > alloc_pool_huge_page+0x102/0x120 > set_max_huge_pages+0x13e/0x350 > hugetlb_sysctl_handler_common+0xd8/0x110 > hugetlb_sysctl_handler+0x48/0x58 > proc_sys_call_handler+0x138/0x238 > new_sync_write+0x10e/0x198 > vfs_write.part.0+0x12c/0x238 > ksys_write+0x68/0xf8 > do_syscall+0x82/0xd0 > __do_syscall+0xb4/0xc8 > system_call+0x72/0x98 > > This is a result of the change which moved the hugetlb page subpool > pointer from page->private to page[1]->private. When new pages are > allocated from the buddy allocator, the private field of the head > page will be cleared, but the private field of subpages is not modified. > Therefore, old values may remain. > > Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). > > Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") > Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Reviewed-by: Muchun Song <songmuchun@bytedance.com> Thanks. > --- > mm/hugetlb.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c232cb67dda2..7ae5c18c98a7 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > + hugetlb_set_page_subpool(page, NULL); > set_hugetlb_cgroup(page, NULL); > set_hugetlb_cgroup_rsvd(page, NULL); > spin_lock(&hugetlb_lock); > -- > 2.29.2 >
On Tue 23-02-21 13:55:44, Mike Kravetz wrote: > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() > with linux-next 5.12.0-20210222. > Call trace: > hugepage_subpool_put_pages.part.0+0x2c/0x138 > __free_huge_page+0xce/0x310 > alloc_pool_huge_page+0x102/0x120 > set_max_huge_pages+0x13e/0x350 > hugetlb_sysctl_handler_common+0xd8/0x110 > hugetlb_sysctl_handler+0x48/0x58 > proc_sys_call_handler+0x138/0x238 > new_sync_write+0x10e/0x198 > vfs_write.part.0+0x12c/0x238 > ksys_write+0x68/0xf8 > do_syscall+0x82/0xd0 > __do_syscall+0xb4/0xc8 > system_call+0x72/0x98 > > This is a result of the change which moved the hugetlb page subpool > pointer from page->private to page[1]->private. When new pages are > allocated from the buddy allocator, the private field of the head > page will be cleared, but the private field of subpages is not modified. > Therefore, old values may remain. Very interesting. I have expected that the page->private would be in a reasonable state when allocated. On the other hand hugetlb doesn't do __GFP_COMP so tail pages are not initialized by the allocator. > Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). > > Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") This is not a stable sha to refer to as it comes from linux next. > Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> Acked-by: Michal Hocko <mhocko@suse.com> I think this would be worth a separate patch rather than having it folded into the original patch. Thi is really subtle. > --- > mm/hugetlb.c | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index c232cb67dda2..7ae5c18c98a7 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > { > INIT_LIST_HEAD(&page->lru); > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > + hugetlb_set_page_subpool(page, NULL); > set_hugetlb_cgroup(page, NULL); > set_hugetlb_cgroup_rsvd(page, NULL); > spin_lock(&hugetlb_lock); > -- > 2.29.2 >
On Wed, Feb 24, 2021 at 5:43 PM Michal Hocko <mhocko@suse.com> wrote: > > On Tue 23-02-21 13:55:44, Mike Kravetz wrote: > > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() > > with linux-next 5.12.0-20210222. > > Call trace: > > hugepage_subpool_put_pages.part.0+0x2c/0x138 > > __free_huge_page+0xce/0x310 > > alloc_pool_huge_page+0x102/0x120 > > set_max_huge_pages+0x13e/0x350 > > hugetlb_sysctl_handler_common+0xd8/0x110 > > hugetlb_sysctl_handler+0x48/0x58 > > proc_sys_call_handler+0x138/0x238 > > new_sync_write+0x10e/0x198 > > vfs_write.part.0+0x12c/0x238 > > ksys_write+0x68/0xf8 > > do_syscall+0x82/0xd0 > > __do_syscall+0xb4/0xc8 > > system_call+0x72/0x98 > > > > This is a result of the change which moved the hugetlb page subpool > > pointer from page->private to page[1]->private. When new pages are > > allocated from the buddy allocator, the private field of the head > > page will be cleared, but the private field of subpages is not modified. > > Therefore, old values may remain. > > Very interesting. I have expected that the page->private would be in a > reasonable state when allocated. On the other hand hugetlb doesn't do > __GFP_COMP so tail pages are not initialized by the allocator. It seems that the buddy allocator does not initialize the private field of the tail page even when we specify __GFP_COMP. > > > Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). > > > > Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") > > This is not a stable sha to refer to as it comes from linux next. > > > Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> > > Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> > > Acked-by: Michal Hocko <mhocko@suse.com> > > I think this would be worth a separate patch rather than having it > folded into the original patch. Thi is really subtle. > > > --- > > mm/hugetlb.c | 1 + > > 1 file changed, 1 insertion(+) > > > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > > index c232cb67dda2..7ae5c18c98a7 100644 > > --- a/mm/hugetlb.c > > +++ b/mm/hugetlb.c > > @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) > > { > > INIT_LIST_HEAD(&page->lru); > > set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); > > + hugetlb_set_page_subpool(page, NULL); > > set_hugetlb_cgroup(page, NULL); > > set_hugetlb_cgroup_rsvd(page, NULL); > > spin_lock(&hugetlb_lock); > > -- > > 2.29.2 > > > > -- > Michal Hocko > SUSE Labs
On Wed 24-02-21 19:10:42, Muchun Song wrote: > On Wed, Feb 24, 2021 at 5:43 PM Michal Hocko <mhocko@suse.com> wrote: > > > > On Tue 23-02-21 13:55:44, Mike Kravetz wrote: > > > Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() > > > with linux-next 5.12.0-20210222. > > > Call trace: > > > hugepage_subpool_put_pages.part.0+0x2c/0x138 > > > __free_huge_page+0xce/0x310 > > > alloc_pool_huge_page+0x102/0x120 > > > set_max_huge_pages+0x13e/0x350 > > > hugetlb_sysctl_handler_common+0xd8/0x110 > > > hugetlb_sysctl_handler+0x48/0x58 > > > proc_sys_call_handler+0x138/0x238 > > > new_sync_write+0x10e/0x198 > > > vfs_write.part.0+0x12c/0x238 > > > ksys_write+0x68/0xf8 > > > do_syscall+0x82/0xd0 > > > __do_syscall+0xb4/0xc8 > > > system_call+0x72/0x98 > > > > > > This is a result of the change which moved the hugetlb page subpool > > > pointer from page->private to page[1]->private. When new pages are > > > allocated from the buddy allocator, the private field of the head > > > page will be cleared, but the private field of subpages is not modified. > > > Therefore, old values may remain. > > > > Very interesting. I have expected that the page->private would be in a > > reasonable state when allocated. On the other hand hugetlb doesn't do > > __GFP_COMP so tail pages are not initialized by the allocator. > > It seems that the buddy allocator does not initialize the private field > of the tail page even when we specify __GFP_COMP. Yes it doesn't. What I meant to say is that even if it did a lack of __GFP_COMP would result in not doing so. I do not remember why hugetlb doesn't use __GFP_COMP but I believe this was never the case.
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c232cb67dda2..7ae5c18c98a7 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1465,6 +1465,7 @@ static void prep_new_huge_page(struct hstate *h, struct page *page, int nid) { INIT_LIST_HEAD(&page->lru); set_compound_page_dtor(page, HUGETLB_PAGE_DTOR); + hugetlb_set_page_subpool(page, NULL); set_hugetlb_cgroup(page, NULL); set_hugetlb_cgroup_rsvd(page, NULL); spin_lock(&hugetlb_lock);
Gerald Schaefer reported a panic on s390 in hugepage_subpool_put_pages() with linux-next 5.12.0-20210222. Call trace: hugepage_subpool_put_pages.part.0+0x2c/0x138 __free_huge_page+0xce/0x310 alloc_pool_huge_page+0x102/0x120 set_max_huge_pages+0x13e/0x350 hugetlb_sysctl_handler_common+0xd8/0x110 hugetlb_sysctl_handler+0x48/0x58 proc_sys_call_handler+0x138/0x238 new_sync_write+0x10e/0x198 vfs_write.part.0+0x12c/0x238 ksys_write+0x68/0xf8 do_syscall+0x82/0xd0 __do_syscall+0xb4/0xc8 system_call+0x72/0x98 This is a result of the change which moved the hugetlb page subpool pointer from page->private to page[1]->private. When new pages are allocated from the buddy allocator, the private field of the head page will be cleared, but the private field of subpages is not modified. Therefore, old values may remain. Fix by initializing hugetlb page subpool pointer in prep_new_huge_page(). Fixes: f1280272ae4d ("hugetlb: use page.private for hugetlb specific page flags") Reported-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Mike Kravetz <mike.kravetz@oracle.com> --- mm/hugetlb.c | 1 + 1 file changed, 1 insertion(+)