Message ID | 20180518194519.3820-18-willy@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 05/18/2018 10:45 PM, Matthew Wilcox wrote: > From: Matthew Wilcox <mawilcox@microsoft.com> > > For diagnosing various performance and memory-leak problems, it is helpful > to be able to distinguish pages which are in use as VMalloc pages. > Unfortunately, we cannot use the page_type field in struct page, as > this is in use for mapcount by some drivers which map vmalloced pages > to userspace. > > Use a special page->mapping value to distinguish VMalloc pages from > other kinds of pages. Also record a pointer to the vm_struct and the > offset within the area in struct page to help reconstruct exactly what > this page is being used for. > This seems useless. page->vm_area and page->vm_offset are never used. There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', and no explanation how is it can be used in current form. Also, this patch breaks code like this: if (mapping = page_mapping(page)) // access mapping
On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: > On 05/18/2018 10:45 PM, Matthew Wilcox wrote: > > From: Matthew Wilcox <mawilcox@microsoft.com> > > > > For diagnosing various performance and memory-leak problems, it is helpful > > to be able to distinguish pages which are in use as VMalloc pages. > > Unfortunately, we cannot use the page_type field in struct page, as > > this is in use for mapcount by some drivers which map vmalloced pages > > to userspace. > > > > Use a special page->mapping value to distinguish VMalloc pages from > > other kinds of pages. Also record a pointer to the vm_struct and the > > offset within the area in struct page to help reconstruct exactly what > > this page is being used for. > > This seems useless. page->vm_area and page->vm_offset are never used. > There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', > and no explanation how is it can be used in current form. Right now, it's by-hand. tools/vm/page-types.c will tell you which pages are allocated to VMalloc. Many people use kernel debuggers, crashdumps and similar to examine the kernel's memory. Leaving these breadcrumbs is helpful, and those fields simply weren't in use before. > Also, this patch breaks code like this: > if (mapping = page_mapping(page)) > // access mapping Example of broken code, please? Pages allocated from the page allocator with alloc_page() come with page->mapping == NULL. This code snippet would not have granted access to vmalloc pages before.
On Tue, 22 May 2018 10:58:36 -0700 Matthew Wilcox <willy@infradead.org> wrote: > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: > > On 05/18/2018 10:45 PM, Matthew Wilcox wrote: > > > From: Matthew Wilcox <mawilcox@microsoft.com> > > > > > > For diagnosing various performance and memory-leak problems, it is helpful > > > to be able to distinguish pages which are in use as VMalloc pages. > > > Unfortunately, we cannot use the page_type field in struct page, as > > > this is in use for mapcount by some drivers which map vmalloced pages > > > to userspace. > > > > > > Use a special page->mapping value to distinguish VMalloc pages from > > > other kinds of pages. Also record a pointer to the vm_struct and the > > > offset within the area in struct page to help reconstruct exactly what > > > this page is being used for. > > > > This seems useless. page->vm_area and page->vm_offset are never used. > > There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', > > and no explanation how is it can be used in current form. > > Right now, it's by-hand. tools/vm/page-types.c will tell you which pages > are allocated to VMalloc. Many people use kernel debuggers, crashdumps > and similar to examine the kernel's memory. Leaving these breadcrumbs > is helpful, and those fields simply weren't in use before. I added this to the changelog: : No in-kernel code uses the new KPF_VMALLOC. Like the other KPF_* : flags, it is for use by tools/vm/page-types.c. > > Also, this patch breaks code like this: > > if (mapping = page_mapping(page)) > > // access mapping > > Example of broken code, please? Pages allocated from the page allocator > with alloc_page() come with page->mapping == NULL. This code snippet > would not have granted access to vmalloc pages before.
On 05/22/2018 08:58 PM, Matthew Wilcox wrote: > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote: >>> From: Matthew Wilcox <mawilcox@microsoft.com> >>> >>> For diagnosing various performance and memory-leak problems, it is helpful >>> to be able to distinguish pages which are in use as VMalloc pages. >>> Unfortunately, we cannot use the page_type field in struct page, as >>> this is in use for mapcount by some drivers which map vmalloced pages >>> to userspace. >>> >>> Use a special page->mapping value to distinguish VMalloc pages from >>> other kinds of pages. Also record a pointer to the vm_struct and the >>> offset within the area in struct page to help reconstruct exactly what >>> this page is being used for. >> >> This seems useless. page->vm_area and page->vm_offset are never used. >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', >> and no explanation how is it can be used in current form. > > Right now, it's by-hand. tools/vm/page-types.c will tell you which pages > are allocated to VMalloc. Many people use kernel debuggers, crashdumps > and similar to examine the kernel's memory. Leaving these breadcrumbs > is helpful, and those fields simply weren't in use before. > >> Also, this patch breaks code like this: >> if (mapping = page_mapping(page)) >> // access mapping > > Example of broken code, please? Pages allocated from the page allocator > with alloc_page() come with page->mapping == NULL. This code snippet > would not have granted access to vmalloc pages before. > Some implementation of the flush_dcache_page(), also set_page_dirty() can be called on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote: > > > On 05/22/2018 08:58 PM, Matthew Wilcox wrote: > > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: > >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote: > >>> From: Matthew Wilcox <mawilcox@microsoft.com> > >>> > >>> For diagnosing various performance and memory-leak problems, it is helpful > >>> to be able to distinguish pages which are in use as VMalloc pages. > >>> Unfortunately, we cannot use the page_type field in struct page, as > >>> this is in use for mapcount by some drivers which map vmalloced pages > >>> to userspace. > >>> > >>> Use a special page->mapping value to distinguish VMalloc pages from > >>> other kinds of pages. Also record a pointer to the vm_struct and the > >>> offset within the area in struct page to help reconstruct exactly what > >>> this page is being used for. > >> > >> This seems useless. page->vm_area and page->vm_offset are never used. > >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', > >> and no explanation how is it can be used in current form. > > > > Right now, it's by-hand. tools/vm/page-types.c will tell you which pages > > are allocated to VMalloc. Many people use kernel debuggers, crashdumps > > and similar to examine the kernel's memory. Leaving these breadcrumbs > > is helpful, and those fields simply weren't in use before. > > > >> Also, this patch breaks code like this: > >> if (mapping = page_mapping(page)) > >> // access mapping > > > > Example of broken code, please? Pages allocated from the page allocator > > with alloc_page() come with page->mapping == NULL. This code snippet > > would not have granted access to vmalloc pages before. > > > > Some implementation of the flush_dcache_page(), also set_page_dirty() can be called > on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty() Do you have any specific example? Why would anybody map vmalloc pages to the userspace? flush_dcache_page on a vmalloc page sounds quite unexpected to me as well.
On 05/23/2018 09:34 AM, Michal Hocko wrote: > On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote: >> >> >> On 05/22/2018 08:58 PM, Matthew Wilcox wrote: >>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: >>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote: >>>>> From: Matthew Wilcox <mawilcox@microsoft.com> >>>>> >>>>> For diagnosing various performance and memory-leak problems, it is helpful >>>>> to be able to distinguish pages which are in use as VMalloc pages. >>>>> Unfortunately, we cannot use the page_type field in struct page, as >>>>> this is in use for mapcount by some drivers which map vmalloced pages >>>>> to userspace. >>>>> >>>>> Use a special page->mapping value to distinguish VMalloc pages from >>>>> other kinds of pages. Also record a pointer to the vm_struct and the >>>>> offset within the area in struct page to help reconstruct exactly what >>>>> this page is being used for. >>>> >>>> This seems useless. page->vm_area and page->vm_offset are never used. >>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', >>>> and no explanation how is it can be used in current form. >>> >>> Right now, it's by-hand. tools/vm/page-types.c will tell you which pages >>> are allocated to VMalloc. Many people use kernel debuggers, crashdumps >>> and similar to examine the kernel's memory. Leaving these breadcrumbs >>> is helpful, and those fields simply weren't in use before. >>> >>>> Also, this patch breaks code like this: >>>> if (mapping = page_mapping(page)) >>>> // access mapping >>> >>> Example of broken code, please? Pages allocated from the page allocator >>> with alloc_page() come with page->mapping == NULL. This code snippet >>> would not have granted access to vmalloc pages before. >>> >> >> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called >> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty() > > Do you have any specific example? git grep -e remap_vmalloc_range -e vmalloc_user But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates. > Why would anybody map vmalloc pages to the userspace? To have shared memory between usespace and the kernel. > flush_dcache_page on a vmalloc page sounds quite > unexpected to me as well. > remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page()
On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote: > > > On 05/23/2018 09:34 AM, Michal Hocko wrote: > > On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote: > >> > >> > >> On 05/22/2018 08:58 PM, Matthew Wilcox wrote: > >>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: > >>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote: > >>>>> From: Matthew Wilcox <mawilcox@microsoft.com> > >>>>> > >>>>> For diagnosing various performance and memory-leak problems, it is helpful > >>>>> to be able to distinguish pages which are in use as VMalloc pages. > >>>>> Unfortunately, we cannot use the page_type field in struct page, as > >>>>> this is in use for mapcount by some drivers which map vmalloced pages > >>>>> to userspace. > >>>>> > >>>>> Use a special page->mapping value to distinguish VMalloc pages from > >>>>> other kinds of pages. Also record a pointer to the vm_struct and the > >>>>> offset within the area in struct page to help reconstruct exactly what > >>>>> this page is being used for. > >>>> > >>>> This seems useless. page->vm_area and page->vm_offset are never used. > >>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', > >>>> and no explanation how is it can be used in current form. > >>> > >>> Right now, it's by-hand. tools/vm/page-types.c will tell you which pages > >>> are allocated to VMalloc. Many people use kernel debuggers, crashdumps > >>> and similar to examine the kernel's memory. Leaving these breadcrumbs > >>> is helpful, and those fields simply weren't in use before. > >>> > >>>> Also, this patch breaks code like this: > >>>> if (mapping = page_mapping(page)) > >>>> // access mapping > >>> > >>> Example of broken code, please? Pages allocated from the page allocator > >>> with alloc_page() come with page->mapping == NULL. This code snippet > >>> would not have granted access to vmalloc pages before. > >>> > >> > >> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called > >> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty() > > > > Do you have any specific example? > > git grep -e remap_vmalloc_range -e vmalloc_user > > But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates. Thanks for the pointer. I was not aware of remap_vmalloc_range. > > > Why would anybody map vmalloc pages to the userspace? > > To have shared memory between usespace and the kernel. OK, so the point seems to be to share large physically contiguous memory with userspace. > > flush_dcache_page on a vmalloc page sounds quite > > unexpected to me as well. > > > > remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page() Thanks!
On 05/23/2018 12:25 PM, Michal Hocko wrote: > On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote: >> >> >> On 05/23/2018 09:34 AM, Michal Hocko wrote: >>> On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote: >>>> >>>> >>>> On 05/22/2018 08:58 PM, Matthew Wilcox wrote: >>>>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote: >>>>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote: >>>>>>> From: Matthew Wilcox <mawilcox@microsoft.com> >>>>>>> >>>>>>> For diagnosing various performance and memory-leak problems, it is helpful >>>>>>> to be able to distinguish pages which are in use as VMalloc pages. >>>>>>> Unfortunately, we cannot use the page_type field in struct page, as >>>>>>> this is in use for mapcount by some drivers which map vmalloced pages >>>>>>> to userspace. >>>>>>> >>>>>>> Use a special page->mapping value to distinguish VMalloc pages from >>>>>>> other kinds of pages. Also record a pointer to the vm_struct and the >>>>>>> offset within the area in struct page to help reconstruct exactly what >>>>>>> this page is being used for. >>>>>> >>>>>> This seems useless. page->vm_area and page->vm_offset are never used. >>>>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems', >>>>>> and no explanation how is it can be used in current form. >>>>> >>>>> Right now, it's by-hand. tools/vm/page-types.c will tell you which pages >>>>> are allocated to VMalloc. Many people use kernel debuggers, crashdumps >>>>> and similar to examine the kernel's memory. Leaving these breadcrumbs >>>>> is helpful, and those fields simply weren't in use before. >>>>> >>>>>> Also, this patch breaks code like this: >>>>>> if (mapping = page_mapping(page)) >>>>>> // access mapping >>>>> >>>>> Example of broken code, please? Pages allocated from the page allocator >>>>> with alloc_page() come with page->mapping == NULL. This code snippet >>>>> would not have granted access to vmalloc pages before. >>>>> >>>> >>>> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called >>>> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty() >>> >>> Do you have any specific example? >> >> git grep -e remap_vmalloc_range -e vmalloc_user >> >> But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates. > > Thanks for the pointer. I was not aware of remap_vmalloc_range. >> >>> Why would anybody map vmalloc pages to the userspace? >> >> To have shared memory between usespace and the kernel. > > OK, so the point seems to be to share large physically contiguous memory > with userspace. > Not physically, but virtually contiguous.
On Wed 23-05-18 12:28:10, Andrey Ryabinin wrote: > On 05/23/2018 12:25 PM, Michal Hocko wrote: > > OK, so the point seems to be to share large physically contiguous memory > > with userspace. > > > > Not physically, but virtually contiguous. Ble, you are right! That's what I meant...
diff --git a/fs/proc/page.c b/fs/proc/page.c index 792c78a49174..fc83dae1af7b 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -156,6 +156,8 @@ u64 stable_page_flags(struct page *page) u |= 1 << KPF_BALLOON; if (PageTable(page)) u |= 1 << KPF_PGTABLE; + if (PageVMalloc(page)) + u |= 1 << KPF_VMALLOC; if (page_is_idle(page)) u |= 1 << KPF_IDLE; diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 530a9a2b039b..9a3b677e2c1d 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -146,6 +146,11 @@ struct page { spinlock_t ptl; #endif }; + struct { /* VMalloc pages */ + struct vm_struct *vm_area; + unsigned long vm_offset; + unsigned long _vm_id; /* MAPPING_VMalloc */ + }; struct { /* ZONE_DEVICE pages */ /** @pgmap: Points to the hosting device page map. */ struct dev_pagemap *pgmap; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 901943e4754b..5232433175c1 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -699,6 +699,31 @@ PAGE_TYPE_OPS(Kmemcg, kmemcg) */ PAGE_TYPE_OPS(Table, table) +/* + * vmalloc pages may be mapped to userspace, so we need some other way + * to distinguish them from other kinds of pages. Use page->mapping + * for this purpose. Values below 0x1000 cannot be real pointers. + */ +#define MAPPING_VMalloc (void *)0x440 + +#define PAGE_MAPPING_OPS(name) \ +static __always_inline int Page##name(struct page *page) \ +{ \ + return page->mapping == MAPPING_##name; \ +} \ +static __always_inline void __SetPage##name(struct page *page) \ +{ \ + VM_BUG_ON_PAGE(page->mapping != NULL, page); \ + page->mapping = MAPPING_##name; \ +} \ +static __always_inline void __ClearPage##name(struct page *page) \ +{ \ + VM_BUG_ON_PAGE(page->mapping != MAPPING_##name, page); \ + page->mapping = NULL; \ +} + +PAGE_MAPPING_OPS(VMalloc) + extern bool is_free_buddy_page(struct page *page); __PAGEFLAG(Isolated, isolated, PF_ANY); diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h index 21b9113c69da..6800968b8f47 100644 --- a/include/uapi/linux/kernel-page-flags.h +++ b/include/uapi/linux/kernel-page-flags.h @@ -36,5 +36,6 @@ #define KPF_ZERO_PAGE 24 #define KPF_IDLE 25 #define KPF_PGTABLE 26 +#define KPF_VMALLOC 27 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */ diff --git a/mm/vmalloc.c b/mm/vmalloc.c index 5fbf27e7f956..98bc690d472d 100644 --- a/mm/vmalloc.c +++ b/mm/vmalloc.c @@ -1535,7 +1535,7 @@ static void __vunmap(const void *addr, int deallocate_pages) for (i = 0; i < area->nr_pages; i++) { struct page *page = area->pages[i]; - BUG_ON(!page); + __ClearPageVMalloc(page); __free_pages(page, 0); } @@ -1704,6 +1704,9 @@ static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask, area->nr_pages = i; goto fail; } + __SetPageVMalloc(page); + page->vm_area = area; + page->vm_offset = i; area->pages[i] = page; if (gfpflags_allow_blocking(gfp_mask)) cond_resched(); diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c index cce853dca691..25cc21855be4 100644 --- a/tools/vm/page-types.c +++ b/tools/vm/page-types.c @@ -132,6 +132,7 @@ static const char * const page_flag_names[] = { [KPF_THP] = "t:thp", [KPF_BALLOON] = "o:balloon", [KPF_PGTABLE] = "g:pgtable", + [KPF_VMALLOC] = "V:vmalloc", [KPF_ZERO_PAGE] = "z:zero_page", [KPF_IDLE] = "i:idle_page",