diff mbox series

mm/memory: add refcount for special mapping page in copy_one_pte()

Message ID 1587119514-29679-1-git-send-email-qiwuchen55@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/memory: add refcount for special mapping page in copy_one_pte() | expand

Commit Message

chenqiwu April 17, 2020, 10:31 a.m. UTC
From: chenqiwu <chenqiwu@xiaomi.com>

If we get a special mapping page like device mapping page or zero page
when copy_one_pte, it's necessary add the page refcount count.

Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
---
 mm/memory.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Michal Hocko April 17, 2020, 11:43 a.m. UTC | #1
On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> From: chenqiwu <chenqiwu@xiaomi.com>
> 
> If we get a special mapping page like device mapping page or zero page
> when copy_one_pte, it's necessary add the page refcount count.

From the changelog it is not clear what is the actual problem and how
the patch address it. Please be more verbose.

> Signed-off-by: chenqiwu <chenqiwu@xiaomi.com>
> ---
>  mm/memory.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/mm/memory.c b/mm/memory.c
> index f703fe8..a57975a 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -802,8 +802,9 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
>  		get_page(page);
>  		page_dup_rmap(page, false);
>  		rss[mm_counter(page)]++;
> -	} else if (pte_devmap(pte)) {
> +	} else if (pte_devmap(pte) || is_zero_pfn(pte_pfn(pte))) {
>  		page = pte_page(pte);
> +		get_page(page);
>  	}
>  
>  out_set_pte:
> -- 
> 1.9.1
>
chenqiwu April 17, 2020, 2:26 p.m. UTC | #2
On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote:
> On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> > From: chenqiwu <chenqiwu@xiaomi.com>
> > 
> > If we get a special mapping page like device mapping page or zero page
> > when copy_one_pte, it's necessary add the page refcount count.
> 
> From the changelog it is not clear what is the actual problem and how
> the patch address it. Please be more verbose.
>
I don't find any actual problem, but I think there should be addressed
to update the page refcount for special mappings include devmap and zero
page instead of doing nothing else, since we copy the pte from one task
to the other.
Matthew Wilcox (Oracle) April 17, 2020, 2:45 p.m. UTC | #3
On Fri, Apr 17, 2020 at 10:26:18PM +0800, chenqiwu wrote:
> On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote:
> > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> > > From: chenqiwu <chenqiwu@xiaomi.com>
> > > 
> > > If we get a special mapping page like device mapping page or zero page
> > > when copy_one_pte, it's necessary add the page refcount count.
> > 
> > From the changelog it is not clear what is the actual problem and how
> > the patch address it. Please be more verbose.
> >
> I don't find any actual problem, but I think there should be addressed
> to update the page refcount for special mappings include devmap and zero
> page instead of doing nothing else, since we copy the pte from one task
> to the other.

But the zero page is special.  It's never freed.  So unless we're seeing
a refcount problem with the zero page, I would suggest that your patch
is eventually going to overflow the refcount on the zero page.
Michal Hocko April 17, 2020, 3:23 p.m. UTC | #4
On Fri 17-04-20 22:26:18, chenqiwu wrote:
> On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote:
> > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> > > From: chenqiwu <chenqiwu@xiaomi.com>
> > > 
> > > If we get a special mapping page like device mapping page or zero page
> > > when copy_one_pte, it's necessary add the page refcount count.
> > 
> > From the changelog it is not clear what is the actual problem and how
> > the patch address it. Please be more verbose.
> >
> I don't find any actual problem, but I think there should be addressed
> to update the page refcount for special mappings include devmap and zero
> page instead of doing nothing else, since we copy the pte from one task
> to the other.

As Matthew pointed out, zero pages are special. Just check how
vm_normal_page returns NULL (the same is the case for pte_devmap). This
means, among other things that zap_pte_range which is called during
munmap will only clear the pte but it doesn't operate on those pages so
there is no put_page for your get_page here.

I do realize that this might be a subtle details that might be
confusing. On the other hand trying to formulate the specific problem
and add an explanation of the fix in the changelog could have revealed
this. It is really trivial to generate mappings backed by zero pages and
if the reference count was not handled properly then it would blow up
pretty quickly.
chenqiwu April 18, 2020, 3:12 a.m. UTC | #5
On Fri, Apr 17, 2020 at 05:23:31PM +0200, Michal Hocko wrote:
> On Fri 17-04-20 22:26:18, chenqiwu wrote:
> > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote:
> > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> > > > From: chenqiwu <chenqiwu@xiaomi.com>
> > > > 
> > > > If we get a special mapping page like device mapping page or zero page
> > > > when copy_one_pte, it's necessary add the page refcount count.
> > > 
> > > From the changelog it is not clear what is the actual problem and how
> > > the patch address it. Please be more verbose.
> > >
> > I don't find any actual problem, but I think there should be addressed
> > to update the page refcount for special mappings include devmap and zero
> > page instead of doing nothing else, since we copy the pte from one task
> > to the other.
> 
> As Matthew pointed out, zero pages are special. Just check how
> vm_normal_page returns NULL (the same is the case for pte_devmap). This
> means, among other things that zap_pte_range which is called during
> munmap will only clear the pte but it doesn't operate on those pages so
> there is no put_page for your get_page here.
> 
> I do realize that this might be a subtle details that might be
> confusing. On the other hand trying to formulate the specific problem
> and add an explanation of the fix in the changelog could have revealed
> this. It is really trivial to generate mappings backed by zero pages and
> if the reference count was not handled properly then it would blow up
> pretty quickly.
>
I agree. But I can't see where to put the normal page refcount back
in zap_pte_range(), is there an imbalnace between copy_one_pte() and
zap_pte_range()?

BTW, I think the else if condition can be removed since we don't need
to operate devmap pages.
Michal Hocko April 20, 2020, 7:42 a.m. UTC | #6
On Sat 18-04-20 11:12:07, chenqiwu wrote:
> On Fri, Apr 17, 2020 at 05:23:31PM +0200, Michal Hocko wrote:
> > On Fri 17-04-20 22:26:18, chenqiwu wrote:
> > > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote:
> > > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote:
> > > > > From: chenqiwu <chenqiwu@xiaomi.com>
> > > > > 
> > > > > If we get a special mapping page like device mapping page or zero page
> > > > > when copy_one_pte, it's necessary add the page refcount count.
> > > > 
> > > > From the changelog it is not clear what is the actual problem and how
> > > > the patch address it. Please be more verbose.
> > > >
> > > I don't find any actual problem, but I think there should be addressed
> > > to update the page refcount for special mappings include devmap and zero
> > > page instead of doing nothing else, since we copy the pte from one task
> > > to the other.
> > 
> > As Matthew pointed out, zero pages are special. Just check how
> > vm_normal_page returns NULL (the same is the case for pte_devmap). This
> > means, among other things that zap_pte_range which is called during
> > munmap will only clear the pte but it doesn't operate on those pages so
> > there is no put_page for your get_page here.
> > 
> > I do realize that this might be a subtle details that might be
> > confusing. On the other hand trying to formulate the specific problem
> > and add an explanation of the fix in the changelog could have revealed
> > this. It is really trivial to generate mappings backed by zero pages and
> > if the reference count was not handled properly then it would blow up
> > pretty quickly.
> >
> I agree. But I can't see where to put the normal page refcount back
> in zap_pte_range(), is there an imbalnace between copy_one_pte() and
> zap_pte_range()?

No, there is no imbalance. The reference counter handling is very well
hidden. Each page gets tracked by __tlb_remove_page and later handled
tlb_flush_mmu. That all is done to optimize tlb flushing which is
necessary when pages are freed.
diff mbox series

Patch

diff --git a/mm/memory.c b/mm/memory.c
index f703fe8..a57975a 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -802,8 +802,9 @@  struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr,
 		get_page(page);
 		page_dup_rmap(page, false);
 		rss[mm_counter(page)]++;
-	} else if (pte_devmap(pte)) {
+	} else if (pte_devmap(pte) || is_zero_pfn(pte_pfn(pte))) {
 		page = pte_page(pte);
+		get_page(page);
 	}
 
 out_set_pte: