diff mbox

[v6,17/17] mm: Distinguish VMalloc pages

Message ID 20180518194519.3820-18-willy@infradead.org (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Wilcox (Oracle) May 18, 2018, 7:45 p.m. UTC
From: Matthew Wilcox <mawilcox@microsoft.com>

For diagnosing various performance and memory-leak problems, it is helpful
to be able to distinguish pages which are in use as VMalloc pages.
Unfortunately, we cannot use the page_type field in struct page, as
this is in use for mapcount by some drivers which map vmalloced pages
to userspace.

Use a special page->mapping value to distinguish VMalloc pages from
other kinds of pages.  Also record a pointer to the vm_struct and the
offset within the area in struct page to help reconstruct exactly what
this page is being used for.

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 fs/proc/page.c                         |  2 ++
 include/linux/mm_types.h               |  5 +++++
 include/linux/page-flags.h             | 25 +++++++++++++++++++++++++
 include/uapi/linux/kernel-page-flags.h |  1 +
 mm/vmalloc.c                           |  5 ++++-
 tools/vm/page-types.c                  |  1 +
 6 files changed, 38 insertions(+), 1 deletion(-)

Comments

Andrey Ryabinin May 22, 2018, 4:10 p.m. UTC | #1
On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
> 
> For diagnosing various performance and memory-leak problems, it is helpful
> to be able to distinguish pages which are in use as VMalloc pages.
> Unfortunately, we cannot use the page_type field in struct page, as
> this is in use for mapcount by some drivers which map vmalloced pages
> to userspace.
> 
> Use a special page->mapping value to distinguish VMalloc pages from
> other kinds of pages.  Also record a pointer to the vm_struct and the
> offset within the area in struct page to help reconstruct exactly what
> this page is being used for.
> 


This seems useless. page->vm_area and page->vm_offset are never used.
There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
and no explanation how is it can be used in current form.

Also, this patch breaks code like this:
	if (mapping = page_mapping(page))
		// access mapping
Matthew Wilcox (Oracle) May 22, 2018, 5:58 p.m. UTC | #2
On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> > 
> > For diagnosing various performance and memory-leak problems, it is helpful
> > to be able to distinguish pages which are in use as VMalloc pages.
> > Unfortunately, we cannot use the page_type field in struct page, as
> > this is in use for mapcount by some drivers which map vmalloced pages
> > to userspace.
> > 
> > Use a special page->mapping value to distinguish VMalloc pages from
> > other kinds of pages.  Also record a pointer to the vm_struct and the
> > offset within the area in struct page to help reconstruct exactly what
> > this page is being used for.
> 
> This seems useless. page->vm_area and page->vm_offset are never used.
> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> and no explanation how is it can be used in current form.

Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
and similar to examine the kernel's memory.  Leaving these breadcrumbs
is helpful, and those fields simply weren't in use before.

> Also, this patch breaks code like this:
> 	if (mapping = page_mapping(page))
> 		// access mapping

Example of broken code, please?  Pages allocated from the page allocator
with alloc_page() come with page->mapping == NULL.  This code snippet
would not have granted access to vmalloc pages before.
Andrew Morton May 22, 2018, 7:49 p.m. UTC | #3
On Tue, 22 May 2018 10:58:36 -0700 Matthew Wilcox <willy@infradead.org> wrote:

> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> > On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> > > From: Matthew Wilcox <mawilcox@microsoft.com>
> > > 
> > > For diagnosing various performance and memory-leak problems, it is helpful
> > > to be able to distinguish pages which are in use as VMalloc pages.
> > > Unfortunately, we cannot use the page_type field in struct page, as
> > > this is in use for mapcount by some drivers which map vmalloced pages
> > > to userspace.
> > > 
> > > Use a special page->mapping value to distinguish VMalloc pages from
> > > other kinds of pages.  Also record a pointer to the vm_struct and the
> > > offset within the area in struct page to help reconstruct exactly what
> > > this page is being used for.
> > 
> > This seems useless. page->vm_area and page->vm_offset are never used.
> > There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> > and no explanation how is it can be used in current form.
> 
> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> is helpful, and those fields simply weren't in use before.

I added this to the changelog:

: No in-kernel code uses the new KPF_VMALLOC.  Like the other KPF_*
: flags, it is for use by tools/vm/page-types.c.

> > Also, this patch breaks code like this:
> > 	if (mapping = page_mapping(page))
> > 		// access mapping
> 
> Example of broken code, please?  Pages allocated from the page allocator
> with alloc_page() come with page->mapping == NULL.  This code snippet
> would not have granted access to vmalloc pages before.
Andrey Ryabinin May 22, 2018, 7:57 p.m. UTC | #4
On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>
>>> For diagnosing various performance and memory-leak problems, it is helpful
>>> to be able to distinguish pages which are in use as VMalloc pages.
>>> Unfortunately, we cannot use the page_type field in struct page, as
>>> this is in use for mapcount by some drivers which map vmalloced pages
>>> to userspace.
>>>
>>> Use a special page->mapping value to distinguish VMalloc pages from
>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>> offset within the area in struct page to help reconstruct exactly what
>>> this page is being used for.
>>
>> This seems useless. page->vm_area and page->vm_offset are never used.
>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>> and no explanation how is it can be used in current form.
> 
> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> is helpful, and those fields simply weren't in use before.
> 
>> Also, this patch breaks code like this:
>> 	if (mapping = page_mapping(page))
>> 		// access mapping
> 
> Example of broken code, please?  Pages allocated from the page allocator
> with alloc_page() come with page->mapping == NULL.  This code snippet
> would not have granted access to vmalloc pages before.
> 

Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
Michal Hocko May 23, 2018, 6:34 a.m. UTC | #5
On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
> 
> 
> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> > On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> >> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> >>> From: Matthew Wilcox <mawilcox@microsoft.com>
> >>>
> >>> For diagnosing various performance and memory-leak problems, it is helpful
> >>> to be able to distinguish pages which are in use as VMalloc pages.
> >>> Unfortunately, we cannot use the page_type field in struct page, as
> >>> this is in use for mapcount by some drivers which map vmalloced pages
> >>> to userspace.
> >>>
> >>> Use a special page->mapping value to distinguish VMalloc pages from
> >>> other kinds of pages.  Also record a pointer to the vm_struct and the
> >>> offset within the area in struct page to help reconstruct exactly what
> >>> this page is being used for.
> >>
> >> This seems useless. page->vm_area and page->vm_offset are never used.
> >> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> >> and no explanation how is it can be used in current form.
> > 
> > Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> > are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> > and similar to examine the kernel's memory.  Leaving these breadcrumbs
> > is helpful, and those fields simply weren't in use before.
> > 
> >> Also, this patch breaks code like this:
> >> 	if (mapping = page_mapping(page))
> >> 		// access mapping
> > 
> > Example of broken code, please?  Pages allocated from the page allocator
> > with alloc_page() come with page->mapping == NULL.  This code snippet
> > would not have granted access to vmalloc pages before.
> > 
> 
> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()

Do you have any specific example? Why would anybody map vmalloc pages to
the userspace? flush_dcache_page on a vmalloc page sounds quite
unexpected to me as well.
Andrey Ryabinin May 23, 2018, 9:14 a.m. UTC | #6
On 05/23/2018 09:34 AM, Michal Hocko wrote:
> On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
>>
>>
>> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
>>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>>>
>>>>> For diagnosing various performance and memory-leak problems, it is helpful
>>>>> to be able to distinguish pages which are in use as VMalloc pages.
>>>>> Unfortunately, we cannot use the page_type field in struct page, as
>>>>> this is in use for mapcount by some drivers which map vmalloced pages
>>>>> to userspace.
>>>>>
>>>>> Use a special page->mapping value to distinguish VMalloc pages from
>>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>>>> offset within the area in struct page to help reconstruct exactly what
>>>>> this page is being used for.
>>>>
>>>> This seems useless. page->vm_area and page->vm_offset are never used.
>>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>>>> and no explanation how is it can be used in current form.
>>>
>>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
>>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
>>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
>>> is helpful, and those fields simply weren't in use before.
>>>
>>>> Also, this patch breaks code like this:
>>>> 	if (mapping = page_mapping(page))
>>>> 		// access mapping
>>>
>>> Example of broken code, please?  Pages allocated from the page allocator
>>> with alloc_page() come with page->mapping == NULL.  This code snippet
>>> would not have granted access to vmalloc pages before.
>>>
>>
>> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
>> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> 
> Do you have any specific example?

git grep -e remap_vmalloc_range -e vmalloc_user

But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.

> Why would anybody map vmalloc pages to the userspace?

To have shared memory between usespace and the kernel.

> flush_dcache_page on a vmalloc page sounds quite
> unexpected to me as well.
> 

remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page()
Michal Hocko May 23, 2018, 9:25 a.m. UTC | #7
On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote:
> 
> 
> On 05/23/2018 09:34 AM, Michal Hocko wrote:
> > On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
> >>
> >>
> >> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
> >>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
> >>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
> >>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
> >>>>>
> >>>>> For diagnosing various performance and memory-leak problems, it is helpful
> >>>>> to be able to distinguish pages which are in use as VMalloc pages.
> >>>>> Unfortunately, we cannot use the page_type field in struct page, as
> >>>>> this is in use for mapcount by some drivers which map vmalloced pages
> >>>>> to userspace.
> >>>>>
> >>>>> Use a special page->mapping value to distinguish VMalloc pages from
> >>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
> >>>>> offset within the area in struct page to help reconstruct exactly what
> >>>>> this page is being used for.
> >>>>
> >>>> This seems useless. page->vm_area and page->vm_offset are never used.
> >>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
> >>>> and no explanation how is it can be used in current form.
> >>>
> >>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
> >>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
> >>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
> >>> is helpful, and those fields simply weren't in use before.
> >>>
> >>>> Also, this patch breaks code like this:
> >>>> 	if (mapping = page_mapping(page))
> >>>> 		// access mapping
> >>>
> >>> Example of broken code, please?  Pages allocated from the page allocator
> >>> with alloc_page() come with page->mapping == NULL.  This code snippet
> >>> would not have granted access to vmalloc pages before.
> >>>
> >>
> >> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
> >> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
> > 
> > Do you have any specific example?
> 
> git grep -e remap_vmalloc_range -e vmalloc_user
> 
> But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.

Thanks for the pointer. I was not aware of remap_vmalloc_range.
> 
> > Why would anybody map vmalloc pages to the userspace?
> 
> To have shared memory between usespace and the kernel.

OK, so the point seems to be to share large physically contiguous memory
with userspace.

> > flush_dcache_page on a vmalloc page sounds quite
> > unexpected to me as well.
> > 
> 
> remap_vmalloc_range()->vm_insret_page()->insert_page()->flush_dcache_page()

Thanks!
Andrey Ryabinin May 23, 2018, 9:28 a.m. UTC | #8
On 05/23/2018 12:25 PM, Michal Hocko wrote:
> On Wed 23-05-18 12:14:10, Andrey Ryabinin wrote:
>>
>>
>> On 05/23/2018 09:34 AM, Michal Hocko wrote:
>>> On Tue 22-05-18 22:57:34, Andrey Ryabinin wrote:
>>>>
>>>>
>>>> On 05/22/2018 08:58 PM, Matthew Wilcox wrote:
>>>>> On Tue, May 22, 2018 at 07:10:52PM +0300, Andrey Ryabinin wrote:
>>>>>> On 05/18/2018 10:45 PM, Matthew Wilcox wrote:
>>>>>>> From: Matthew Wilcox <mawilcox@microsoft.com>
>>>>>>>
>>>>>>> For diagnosing various performance and memory-leak problems, it is helpful
>>>>>>> to be able to distinguish pages which are in use as VMalloc pages.
>>>>>>> Unfortunately, we cannot use the page_type field in struct page, as
>>>>>>> this is in use for mapcount by some drivers which map vmalloced pages
>>>>>>> to userspace.
>>>>>>>
>>>>>>> Use a special page->mapping value to distinguish VMalloc pages from
>>>>>>> other kinds of pages.  Also record a pointer to the vm_struct and the
>>>>>>> offset within the area in struct page to help reconstruct exactly what
>>>>>>> this page is being used for.
>>>>>>
>>>>>> This seems useless. page->vm_area and page->vm_offset are never used.
>>>>>> There are no follow up patches which use this new information 'For diagnosing various performance and memory-leak problems',
>>>>>> and no explanation how is it can be used in current form.
>>>>>
>>>>> Right now, it's by-hand.  tools/vm/page-types.c will tell you which pages
>>>>> are allocated to VMalloc.  Many people use kernel debuggers, crashdumps
>>>>> and similar to examine the kernel's memory.  Leaving these breadcrumbs
>>>>> is helpful, and those fields simply weren't in use before.
>>>>>
>>>>>> Also, this patch breaks code like this:
>>>>>> 	if (mapping = page_mapping(page))
>>>>>> 		// access mapping
>>>>>
>>>>> Example of broken code, please?  Pages allocated from the page allocator
>>>>> with alloc_page() come with page->mapping == NULL.  This code snippet
>>>>> would not have granted access to vmalloc pages before.
>>>>>
>>>>
>>>> Some implementation of the flush_dcache_page(), also set_page_dirty() can be called
>>>> on userspace-mapped vmalloc pages during unmap - zap_pte_range() -> set_page_dirty()
>>>
>>> Do you have any specific example?
>>
>> git grep -e remap_vmalloc_range -e vmalloc_user
>>
>> But that's not all, vmalloc*() + vmalloc_to_page() + vm_insert_page() are another candidates.
> 
> Thanks for the pointer. I was not aware of remap_vmalloc_range.
>>
>>> Why would anybody map vmalloc pages to the userspace?
>>
>> To have shared memory between usespace and the kernel.
> 
> OK, so the point seems to be to share large physically contiguous memory
> with userspace.
> 

Not physically, but virtually contiguous.
Michal Hocko May 23, 2018, 9:52 a.m. UTC | #9
On Wed 23-05-18 12:28:10, Andrey Ryabinin wrote:
> On 05/23/2018 12:25 PM, Michal Hocko wrote:
> > OK, so the point seems to be to share large physically contiguous memory
> > with userspace.
> > 
> 
> Not physically, but virtually contiguous.

Ble, you are right! That's what I meant...
diff mbox

Patch

diff --git a/fs/proc/page.c b/fs/proc/page.c
index 792c78a49174..fc83dae1af7b 100644
--- a/fs/proc/page.c
+++ b/fs/proc/page.c
@@ -156,6 +156,8 @@  u64 stable_page_flags(struct page *page)
 		u |= 1 << KPF_BALLOON;
 	if (PageTable(page))
 		u |= 1 << KPF_PGTABLE;
+	if (PageVMalloc(page))
+		u |= 1 << KPF_VMALLOC;
 
 	if (page_is_idle(page))
 		u |= 1 << KPF_IDLE;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 530a9a2b039b..9a3b677e2c1d 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -146,6 +146,11 @@  struct page {
 			spinlock_t ptl;
 #endif
 		};
+		struct {	/* VMalloc pages */
+			struct vm_struct *vm_area;
+			unsigned long vm_offset;
+			unsigned long _vm_id;	/* MAPPING_VMalloc */
+		};
 		struct {	/* ZONE_DEVICE pages */
 			/** @pgmap: Points to the hosting device page map. */
 			struct dev_pagemap *pgmap;
diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 901943e4754b..5232433175c1 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -699,6 +699,31 @@  PAGE_TYPE_OPS(Kmemcg, kmemcg)
  */
 PAGE_TYPE_OPS(Table, table)
 
+/*
+ * vmalloc pages may be mapped to userspace, so we need some other way
+ * to distinguish them from other kinds of pages.  Use page->mapping
+ * for this purpose.  Values below 0x1000 cannot be real pointers.
+ */
+#define MAPPING_VMalloc		(void *)0x440
+
+#define PAGE_MAPPING_OPS(name)						\
+static __always_inline int Page##name(struct page *page)		\
+{									\
+	return page->mapping == MAPPING_##name;				\
+}									\
+static __always_inline void __SetPage##name(struct page *page)		\
+{									\
+	VM_BUG_ON_PAGE(page->mapping != NULL, page);			\
+	page->mapping = MAPPING_##name;					\
+}									\
+static __always_inline void __ClearPage##name(struct page *page)	\
+{									\
+	VM_BUG_ON_PAGE(page->mapping != MAPPING_##name, page);		\
+	page->mapping = NULL;						\
+}
+
+PAGE_MAPPING_OPS(VMalloc)
+
 extern bool is_free_buddy_page(struct page *page);
 
 __PAGEFLAG(Isolated, isolated, PF_ANY);
diff --git a/include/uapi/linux/kernel-page-flags.h b/include/uapi/linux/kernel-page-flags.h
index 21b9113c69da..6800968b8f47 100644
--- a/include/uapi/linux/kernel-page-flags.h
+++ b/include/uapi/linux/kernel-page-flags.h
@@ -36,5 +36,6 @@ 
 #define KPF_ZERO_PAGE		24
 #define KPF_IDLE		25
 #define KPF_PGTABLE		26
+#define KPF_VMALLOC		27
 
 #endif /* _UAPILINUX_KERNEL_PAGE_FLAGS_H */
diff --git a/mm/vmalloc.c b/mm/vmalloc.c
index 5fbf27e7f956..98bc690d472d 100644
--- a/mm/vmalloc.c
+++ b/mm/vmalloc.c
@@ -1535,7 +1535,7 @@  static void __vunmap(const void *addr, int deallocate_pages)
 		for (i = 0; i < area->nr_pages; i++) {
 			struct page *page = area->pages[i];
 
-			BUG_ON(!page);
+			__ClearPageVMalloc(page);
 			__free_pages(page, 0);
 		}
 
@@ -1704,6 +1704,9 @@  static void *__vmalloc_area_node(struct vm_struct *area, gfp_t gfp_mask,
 			area->nr_pages = i;
 			goto fail;
 		}
+		__SetPageVMalloc(page);
+		page->vm_area = area;
+		page->vm_offset = i;
 		area->pages[i] = page;
 		if (gfpflags_allow_blocking(gfp_mask))
 			cond_resched();
diff --git a/tools/vm/page-types.c b/tools/vm/page-types.c
index cce853dca691..25cc21855be4 100644
--- a/tools/vm/page-types.c
+++ b/tools/vm/page-types.c
@@ -132,6 +132,7 @@  static const char * const page_flag_names[] = {
 	[KPF_THP]		= "t:thp",
 	[KPF_BALLOON]		= "o:balloon",
 	[KPF_PGTABLE]		= "g:pgtable",
+	[KPF_VMALLOC]		= "V:vmalloc",
 	[KPF_ZERO_PAGE]		= "z:zero_page",
 	[KPF_IDLE]              = "i:idle_page",