Message ID | 1587119514-29679-1-git-send-email-qiwuchen55@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/memory: add refcount for special mapping page in copy_one_pte() | expand |
On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > From: chenqiwu <chenqiwu@xiaomi.com> > > If we get a special mapping page like device mapping page or zero page > when copy_one_pte, it's necessary add the page refcount count. From the changelog it is not clear what is the actual problem and how the patch address it. Please be more verbose. > Signed-off-by: chenqiwu <chenqiwu@xiaomi.com> > --- > mm/memory.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/mm/memory.c b/mm/memory.c > index f703fe8..a57975a 100644 > --- a/mm/memory.c > +++ b/mm/memory.c > @@ -802,8 +802,9 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, > get_page(page); > page_dup_rmap(page, false); > rss[mm_counter(page)]++; > - } else if (pte_devmap(pte)) { > + } else if (pte_devmap(pte) || is_zero_pfn(pte_pfn(pte))) { > page = pte_page(pte); > + get_page(page); > } > > out_set_pte: > -- > 1.9.1 >
On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote: > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > > From: chenqiwu <chenqiwu@xiaomi.com> > > > > If we get a special mapping page like device mapping page or zero page > > when copy_one_pte, it's necessary add the page refcount count. > > From the changelog it is not clear what is the actual problem and how > the patch address it. Please be more verbose. > I don't find any actual problem, but I think there should be addressed to update the page refcount for special mappings include devmap and zero page instead of doing nothing else, since we copy the pte from one task to the other.
On Fri, Apr 17, 2020 at 10:26:18PM +0800, chenqiwu wrote: > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote: > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > > > From: chenqiwu <chenqiwu@xiaomi.com> > > > > > > If we get a special mapping page like device mapping page or zero page > > > when copy_one_pte, it's necessary add the page refcount count. > > > > From the changelog it is not clear what is the actual problem and how > > the patch address it. Please be more verbose. > > > I don't find any actual problem, but I think there should be addressed > to update the page refcount for special mappings include devmap and zero > page instead of doing nothing else, since we copy the pte from one task > to the other. But the zero page is special. It's never freed. So unless we're seeing a refcount problem with the zero page, I would suggest that your patch is eventually going to overflow the refcount on the zero page.
On Fri 17-04-20 22:26:18, chenqiwu wrote: > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote: > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > > > From: chenqiwu <chenqiwu@xiaomi.com> > > > > > > If we get a special mapping page like device mapping page or zero page > > > when copy_one_pte, it's necessary add the page refcount count. > > > > From the changelog it is not clear what is the actual problem and how > > the patch address it. Please be more verbose. > > > I don't find any actual problem, but I think there should be addressed > to update the page refcount for special mappings include devmap and zero > page instead of doing nothing else, since we copy the pte from one task > to the other. As Matthew pointed out, zero pages are special. Just check how vm_normal_page returns NULL (the same is the case for pte_devmap). This means, among other things that zap_pte_range which is called during munmap will only clear the pte but it doesn't operate on those pages so there is no put_page for your get_page here. I do realize that this might be a subtle details that might be confusing. On the other hand trying to formulate the specific problem and add an explanation of the fix in the changelog could have revealed this. It is really trivial to generate mappings backed by zero pages and if the reference count was not handled properly then it would blow up pretty quickly.
On Fri, Apr 17, 2020 at 05:23:31PM +0200, Michal Hocko wrote: > On Fri 17-04-20 22:26:18, chenqiwu wrote: > > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote: > > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > > > > From: chenqiwu <chenqiwu@xiaomi.com> > > > > > > > > If we get a special mapping page like device mapping page or zero page > > > > when copy_one_pte, it's necessary add the page refcount count. > > > > > > From the changelog it is not clear what is the actual problem and how > > > the patch address it. Please be more verbose. > > > > > I don't find any actual problem, but I think there should be addressed > > to update the page refcount for special mappings include devmap and zero > > page instead of doing nothing else, since we copy the pte from one task > > to the other. > > As Matthew pointed out, zero pages are special. Just check how > vm_normal_page returns NULL (the same is the case for pte_devmap). This > means, among other things that zap_pte_range which is called during > munmap will only clear the pte but it doesn't operate on those pages so > there is no put_page for your get_page here. > > I do realize that this might be a subtle details that might be > confusing. On the other hand trying to formulate the specific problem > and add an explanation of the fix in the changelog could have revealed > this. It is really trivial to generate mappings backed by zero pages and > if the reference count was not handled properly then it would blow up > pretty quickly. > I agree. But I can't see where to put the normal page refcount back in zap_pte_range(), is there an imbalnace between copy_one_pte() and zap_pte_range()? BTW, I think the else if condition can be removed since we don't need to operate devmap pages.
On Sat 18-04-20 11:12:07, chenqiwu wrote: > On Fri, Apr 17, 2020 at 05:23:31PM +0200, Michal Hocko wrote: > > On Fri 17-04-20 22:26:18, chenqiwu wrote: > > > On Fri, Apr 17, 2020 at 01:43:12PM +0200, Michal Hocko wrote: > > > > On Fri 17-04-20 18:31:54, qiwuchen55@gmail.com wrote: > > > > > From: chenqiwu <chenqiwu@xiaomi.com> > > > > > > > > > > If we get a special mapping page like device mapping page or zero page > > > > > when copy_one_pte, it's necessary add the page refcount count. > > > > > > > > From the changelog it is not clear what is the actual problem and how > > > > the patch address it. Please be more verbose. > > > > > > > I don't find any actual problem, but I think there should be addressed > > > to update the page refcount for special mappings include devmap and zero > > > page instead of doing nothing else, since we copy the pte from one task > > > to the other. > > > > As Matthew pointed out, zero pages are special. Just check how > > vm_normal_page returns NULL (the same is the case for pte_devmap). This > > means, among other things that zap_pte_range which is called during > > munmap will only clear the pte but it doesn't operate on those pages so > > there is no put_page for your get_page here. > > > > I do realize that this might be a subtle details that might be > > confusing. On the other hand trying to formulate the specific problem > > and add an explanation of the fix in the changelog could have revealed > > this. It is really trivial to generate mappings backed by zero pages and > > if the reference count was not handled properly then it would blow up > > pretty quickly. > > > I agree. But I can't see where to put the normal page refcount back > in zap_pte_range(), is there an imbalnace between copy_one_pte() and > zap_pte_range()? No, there is no imbalance. The reference counter handling is very well hidden. Each page gets tracked by __tlb_remove_page and later handled tlb_flush_mmu. That all is done to optimize tlb flushing which is necessary when pages are freed.
diff --git a/mm/memory.c b/mm/memory.c index f703fe8..a57975a 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -802,8 +802,9 @@ struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, get_page(page); page_dup_rmap(page, false); rss[mm_counter(page)]++; - } else if (pte_devmap(pte)) { + } else if (pte_devmap(pte) || is_zero_pfn(pte_pfn(pte))) { page = pte_page(pte); + get_page(page); } out_set_pte: