diff mbox

[RFC,V2] mm:add zero_page _mapcount when mapped into user space

Message ID 35FD53F367049845BC99AC72306C23D103E688B313E0@CNBJMBX05.corpusers.net (mailing list archive)
State New, archived
Headers show

Commit Message

Wang, Yalin Dec. 2, 2014, 9:27 a.m. UTC
This patch add/dec zero_page's _mapcount to make sure
the mapcount is correct for zero_page,
so that when read from /proc/kpagecount, zero_page's
mapcount is also correct, userspace process like procrank can
calculate PSS correctly.

Signed-off-by: Yalin Wang <yalin.wang@sonymobile.com>
---
 mm/memory.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Kirill A. Shutemov Dec. 2, 2014, 11:30 a.m. UTC | #1
On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> This patch add/dec zero_page's _mapcount to make sure
> the mapcount is correct for zero_page,
> so that when read from /proc/kpagecount, zero_page's
> mapcount is also correct, userspace process like procrank can
> calculate PSS correctly.

I don't have specific code path to point to, but I would expect zero page
with non-zero mapcount would cause a problem with rmap.

How do you test the change?
Wang, Yalin Dec. 4, 2014, 6:10 a.m. UTC | #2
> -----Original Message-----
> From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> Sent: Tuesday, December 02, 2014 7:30 PM
> To: Wang, Yalin
> Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org'; 'linux-arm-
> kernel@lists.infradead.org'
> Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> space
> 
> On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> > This patch add/dec zero_page's _mapcount to make sure the mapcount is
> > correct for zero_page, so that when read from /proc/kpagecount,
> > zero_page's mapcount is also correct, userspace process like procrank
> > can calculate PSS correctly.
> 
> I don't have specific code path to point to, but I would expect zero page
> with non-zero mapcount would cause a problem with rmap.
> 
> How do you test the change?
> 
I just test it to see the mapcount from /proc/pid/pagemap  and /proc/kpagecount ,
It works well,
The problem is that when I see /proc/pid/smaps ,
The Rss / Pss don't calculate zero_page map,
Because smaps_pte_entry() --> vm_normal_page( ),
Will return NULL for zero_page,

But when userspace process cat /proc/pid/pagemap  ,
It will see zero_page mapped,
And will treat as Rss ,  
This is weird, should we also omit zero_page in /proc/pid/pagemap ?
Or add zero_page as Rss in /proc/pid/smaps ? 

I think we should add zero_page into Rss ,
Because it is really mapped into userspace address space.
And will let userspace memory analysis more accurate .

Thanks
Kirill A. Shutemov Dec. 4, 2014, 12:28 p.m. UTC | #3
On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
> > -----Original Message-----
> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> > Sent: Tuesday, December 02, 2014 7:30 PM
> > To: Wang, Yalin
> > Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org'; 'linux-arm-
> > kernel@lists.infradead.org'
> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> > space
> > 
> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> > > This patch add/dec zero_page's _mapcount to make sure the mapcount is
> > > correct for zero_page, so that when read from /proc/kpagecount,
> > > zero_page's mapcount is also correct, userspace process like procrank
> > > can calculate PSS correctly.
> > 
> > I don't have specific code path to point to, but I would expect zero page
> > with non-zero mapcount would cause a problem with rmap.
> > 
> > How do you test the change?
> > 
> I just test it to see the mapcount from /proc/pid/pagemap  and /proc/kpagecount ,
> It works well,

I took a closer look and your patch is broken in multiple places:
 - on zap_pte_range() you don't decrement mapcount;
 - you don't update rss counters for mm;
 - copy_one_pte() doesn't increase mapcount;
 - ...

Basically, each and every vm_normal_page() call must be audited. As first
step. And you totally skip huge zero page.

Proper mapcount handling for zero page would require a lot more work and I
don't think it worth it. Gain is too small.

NAK.

> The problem is that when I see /proc/pid/smaps ,
> The Rss / Pss don't calculate zero_page map,
> Because smaps_pte_entry() --> vm_normal_page( ),
> Will return NULL for zero_page,
> 
> But when userspace process cat /proc/pid/pagemap  ,
> It will see zero_page mapped,
> And will treat as Rss ,  
> This is weird, should we also omit zero_page in /proc/pid/pagemap ?
> Or add zero_page as Rss in /proc/pid/smaps ? 
> 
> I think we should add zero_page into Rss ,
> Because it is really mapped into userspace address space.
> And will let userspace memory analysis more accurate .

It would be easier for userspace to find out pfn of zero page and take it
into account.

Note: some architectures have multiple zero page due to coloring.
Konstantin Khlebnikov Dec. 5, 2014, 6:39 a.m. UTC | #4
On Thu, Dec 4, 2014 at 3:28 PM, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
>> > -----Original Message-----
>> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
>> > Sent: Tuesday, December 02, 2014 7:30 PM
>> > To: Wang, Yalin
>> > Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org'; 'linux-arm-
>> > kernel@lists.infradead.org'
>> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
>> > space
>> >
>> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
>> > > This patch add/dec zero_page's _mapcount to make sure the mapcount is
>> > > correct for zero_page, so that when read from /proc/kpagecount,
>> > > zero_page's mapcount is also correct, userspace process like procrank
>> > > can calculate PSS correctly.

Instead of tweaking mapcount you could mark zero-pages in /proc/kpageflags
and handle them accordingly in userspace. Or mark zero pages with special
magic _mapcount and detect it in /proc/kpagecount.

>> >
>> > I don't have specific code path to point to, but I would expect zero page
>> > with non-zero mapcount would cause a problem with rmap.
>> >
>> > How do you test the change?
>> >
>> I just test it to see the mapcount from /proc/pid/pagemap  and /proc/kpagecount ,
>> It works well,
>
> I took a closer look and your patch is broken in multiple places:
>  - on zap_pte_range() you don't decrement mapcount;
>  - you don't update rss counters for mm;
>  - copy_one_pte() doesn't increase mapcount;
>  - ...
>
> Basically, each and every vm_normal_page() call must be audited. As first
> step. And you totally skip huge zero page.
>
> Proper mapcount handling for zero page would require a lot more work and I
> don't think it worth it. Gain is too small.
>
> NAK.
>
>> The problem is that when I see /proc/pid/smaps ,
>> The Rss / Pss don't calculate zero_page map,
>> Because smaps_pte_entry() --> vm_normal_page( ),
>> Will return NULL for zero_page,
>>
>> But when userspace process cat /proc/pid/pagemap  ,
>> It will see zero_page mapped,
>> And will treat as Rss ,
>> This is weird, should we also omit zero_page in /proc/pid/pagemap ?
>> Or add zero_page as Rss in /proc/pid/smaps ?
>>
>> I think we should add zero_page into Rss ,
>> Because it is really mapped into userspace address space.
>> And will let userspace memory analysis more accurate .
>
> It would be easier for userspace to find out pfn of zero page and take it
> into account.
>
> Note: some architectures have multiple zero page due to coloring.
>
> --
>  Kirill A. Shutemov
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
Wang, Yalin Dec. 5, 2014, 8:08 a.m. UTC | #5
> -----Original Message-----
> From: Konstantin Khlebnikov [mailto:koct9i@gmail.com]
> Sent: Friday, December 05, 2014 2:39 PM
> To: Kirill A. Shutemov
> Cc: Wang, Yalin; linux-kernel@vger.kernel.org; linux-mm@kvack.org; linux-
> arm-kernel@lists.infradead.org
> Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into user
> space
> 
> On Thu, Dec 4, 2014 at 3:28 PM, Kirill A. Shutemov <kirill@shutemov.name>
> wrote:
> > On Thu, Dec 04, 2014 at 02:10:53PM +0800, Wang, Yalin wrote:
> >> > -----Original Message-----
> >> > From: Kirill A. Shutemov [mailto:kirill@shutemov.name]
> >> > Sent: Tuesday, December 02, 2014 7:30 PM
> >> > To: Wang, Yalin
> >> > Cc: 'linux-kernel@vger.kernel.org'; 'linux-mm@kvack.org';
> >> > 'linux-arm- kernel@lists.infradead.org'
> >> > Subject: Re: [RFC V2] mm:add zero_page _mapcount when mapped into
> >> > user space
> >> >
> >> > On Tue, Dec 02, 2014 at 05:27:36PM +0800, Wang, Yalin wrote:
> >> > > This patch add/dec zero_page's _mapcount to make sure the
> >> > > mapcount is correct for zero_page, so that when read from
> >> > > /proc/kpagecount, zero_page's mapcount is also correct, userspace
> >> > > process like procrank can calculate PSS correctly.
> 
> Instead of tweaking mapcount you could mark zero-pages in /proc/kpageflags
> and handle them accordingly in userspace. Or mark zero pages with special
> magic _mapcount and detect it in /proc/kpagecount.
> 
I think add KPF_ZERO_PAGE in kpageflags is better.
I will make another patch for review .

Thanks
diff mbox

Patch

diff --git a/mm/memory.c b/mm/memory.c
index 3e50383..7215423 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2198,6 +2198,7 @@  gotten:
 		new_page = alloc_zeroed_user_highpage_movable(vma, address);
 		if (!new_page)
 			goto oom;
+		atomic_dec(&pfn_to_page(pte_pfn(orig_pte))->_mapcount);
 	} else {
 		new_page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, address);
 		if (!new_page)
@@ -2647,6 +2648,7 @@  static int do_anonymous_page(struct mm_struct *mm, struct vm_area_struct *vma,
 		page_table = pte_offset_map_lock(mm, pmd, address, &ptl);
 		if (!pte_none(*page_table))
 			goto unlock;
+		atomic_inc(&pfn_to_page(my_zero_pfn(address))->_mapcount);
 		goto setpte;
 	}