diff mbox series

mm/rmap.c: reuse mergeable anon_vma as parent when fork

Message ID 20191004160632.30251-1-richardw.yang@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series mm/rmap.c: reuse mergeable anon_vma as parent when fork | expand

Commit Message

Wei Yang Oct. 4, 2019, 4:06 p.m. UTC
In function __anon_vma_prepare(), we will try to find anon_vma if it is
possible to reuse it. While on fork, the logic is different.

Since commit 5beb49305251 ("mm: change anon_vma linking to fix
multi-process server scalability issue"), function anon_vma_clone()
tries to allocate new anon_vma for child process. But the logic here
will allocate a new anon_vma for each vma, even in parent this vma
is mergeable and share the same anon_vma with its sibling. This may do
better for scalability issue, while it is not necessary to do so
especially after interval tree is used.

Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
tries to reuse some anon_vma by counting child anon_vma and attached
vmas. While for those mergeable anon_vmas, we can just reuse it and not
necessary to go through the logic.

After this change, kernel build test reduces 20% anon_vma allocation.

Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
---
 mm/rmap.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Matthew Wilcox (Oracle) Oct. 4, 2019, 4:11 p.m. UTC | #1
On Sat, Oct 05, 2019 at 12:06:32AM +0800, Wei Yang wrote:
> After this change, kernel build test reduces 20% anon_vma allocation.

But does it have any effect on elapsed time or peak memory consumption?
Konstantin Khlebnikov Oct. 4, 2019, 4:33 p.m. UTC | #2
On 04/10/2019 19.06, Wei Yang wrote:
> In function __anon_vma_prepare(), we will try to find anon_vma if it is
> possible to reuse it. While on fork, the logic is different.
> 
> Since commit 5beb49305251 ("mm: change anon_vma linking to fix
> multi-process server scalability issue"), function anon_vma_clone()
> tries to allocate new anon_vma for child process. But the logic here
> will allocate a new anon_vma for each vma, even in parent this vma
> is mergeable and share the same anon_vma with its sibling. This may do
> better for scalability issue, while it is not necessary to do so
> especially after interval tree is used.
> 
> Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
> tries to reuse some anon_vma by counting child anon_vma and attached
> vmas. While for those mergeable anon_vmas, we can just reuse it and not
> necessary to go through the logic.
> 
> After this change, kernel build test reduces 20% anon_vma allocation.
> 

Makes sense. This might have much bigger effect for scenarios when task
unmaps holes in huge vma as red-zones between allocations and then forks.

Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>

> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
> ---
>   mm/rmap.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/mm/rmap.c b/mm/rmap.c
> index d9a23bb773bf..12f6c3d7fd9d 100644
> --- a/mm/rmap.c
> +++ b/mm/rmap.c
> @@ -262,6 +262,17 @@ int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
>   {
>   	struct anon_vma_chain *avc, *pavc;
>   	struct anon_vma *root = NULL;
> +	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
> +
> +	/*
> +	 * If parent share anon_vma with its vm_prev, keep this sharing in in
> +	 * child.
> +	 *
> +	 * 1. Parent has vm_prev, which implies we have vm_prev.
> +	 * 2. Parent and its vm_prev have the same anon_vma.
> +	 */
> +	if (pprev && pprev->anon_vma == src->anon_vma)
> +		dst->anon_vma = prev->anon_vma;
>   
>   	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
>   		struct anon_vma *anon_vma;
>
Wei Yang Oct. 4, 2019, 10:33 p.m. UTC | #3
On Fri, Oct 04, 2019 at 09:11:20AM -0700, Matthew Wilcox wrote:
>On Sat, Oct 05, 2019 at 12:06:32AM +0800, Wei Yang wrote:
>> After this change, kernel build test reduces 20% anon_vma allocation.
>
>But does it have any effect on elapsed time or peak memory consumption?

I didn't evaluate these aspects.

BTW, how to evaluate peak memory consumption? This looks a transient value.
Wei Yang Oct. 4, 2019, 10:38 p.m. UTC | #4
On Fri, Oct 04, 2019 at 07:33:53PM +0300, Konstantin Khlebnikov wrote:
>On 04/10/2019 19.06, Wei Yang wrote:
>> In function __anon_vma_prepare(), we will try to find anon_vma if it is
>> possible to reuse it. While on fork, the logic is different.
>> 
>> Since commit 5beb49305251 ("mm: change anon_vma linking to fix
>> multi-process server scalability issue"), function anon_vma_clone()
>> tries to allocate new anon_vma for child process. But the logic here
>> will allocate a new anon_vma for each vma, even in parent this vma
>> is mergeable and share the same anon_vma with its sibling. This may do
>> better for scalability issue, while it is not necessary to do so
>> especially after interval tree is used.
>> 
>> Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma hierarchy")
>> tries to reuse some anon_vma by counting child anon_vma and attached
>> vmas. While for those mergeable anon_vmas, we can just reuse it and not
>> necessary to go through the logic.
>> 
>> After this change, kernel build test reduces 20% anon_vma allocation.
>> 
>
>Makes sense. This might have much bigger effect for scenarios when task
>unmaps holes in huge vma as red-zones between allocations and then forks.
>
>Acked-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
>

Thanks
Rik van Riel Oct. 4, 2019, 11:45 p.m. UTC | #5
On Sat, 2019-10-05 at 00:06 +0800, Wei Yang wrote:
> In function __anon_vma_prepare(), we will try to find anon_vma if it
> is
> possible to reuse it. While on fork, the logic is different.
> 
> Since commit 5beb49305251 ("mm: change anon_vma linking to fix
> multi-process server scalability issue"), function anon_vma_clone()
> tries to allocate new anon_vma for child process. But the logic here
> will allocate a new anon_vma for each vma, even in parent this vma
> is mergeable and share the same anon_vma with its sibling. This may
> do
> better for scalability issue, while it is not necessary to do so
> especially after interval tree is used.
> 
> Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma
> hierarchy")
> tries to reuse some anon_vma by counting child anon_vma and attached
> vmas. While for those mergeable anon_vmas, we can just reuse it and
> not
> necessary to go through the logic.
> 
> After this change, kernel build test reduces 20% anon_vma allocation.
> 
> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>

Acked-by: Rik van Riel <riel@surriel.com>
Wei Yang Oct. 4, 2019, 11:48 p.m. UTC | #6
On Fri, Oct 04, 2019 at 09:11:20AM -0700, Matthew Wilcox wrote:
>On Sat, Oct 05, 2019 at 12:06:32AM +0800, Wei Yang wrote:
>> After this change, kernel build test reduces 20% anon_vma allocation.
>
>But does it have any effect on elapsed time or peak memory consumption?

Do the same kernel build test and record time:


Origin

real	2m50.467s
user	17m52.002s
sys	1m51.953s    

real	2m48.662s
user	17m55.464s
sys	1m50.553s    

real	2m51.143s
user	17m59.687s
sys	1m53.600s    


Patched

real	2m43.733s
user	17m25.705s
sys	1m41.791s    

real	2m47.146s
user	17m47.451s
sys	1m43.474s    

real	2m45.763s
user	17m38.230s
sys	1m42.102s    


For time in sys, it reduced 8.5%.
Wei Yang Oct. 4, 2019, 11:49 p.m. UTC | #7
On Fri, Oct 04, 2019 at 07:45:26PM -0400, Rik van Riel wrote:
>On Sat, 2019-10-05 at 00:06 +0800, Wei Yang wrote:
>> In function __anon_vma_prepare(), we will try to find anon_vma if it
>> is
>> possible to reuse it. While on fork, the logic is different.
>> 
>> Since commit 5beb49305251 ("mm: change anon_vma linking to fix
>> multi-process server scalability issue"), function anon_vma_clone()
>> tries to allocate new anon_vma for child process. But the logic here
>> will allocate a new anon_vma for each vma, even in parent this vma
>> is mergeable and share the same anon_vma with its sibling. This may
>> do
>> better for scalability issue, while it is not necessary to do so
>> especially after interval tree is used.
>> 
>> Commit 7a3ef208e662 ("mm: prevent endless growth of anon_vma
>> hierarchy")
>> tries to reuse some anon_vma by counting child anon_vma and attached
>> vmas. While for those mergeable anon_vmas, we can just reuse it and
>> not
>> necessary to go through the logic.
>> 
>> After this change, kernel build test reduces 20% anon_vma allocation.
>> 
>> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com>
>
>Acked-by: Rik van Riel <riel@surriel.com>
>

Thanks

>-- 
>All Rights Reversed.
Matthew Wilcox (Oracle) Oct. 5, 2019, 1:15 a.m. UTC | #8
On Sat, Oct 05, 2019 at 07:48:45AM +0800, Wei Yang wrote:
> On Fri, Oct 04, 2019 at 09:11:20AM -0700, Matthew Wilcox wrote:
> >On Sat, Oct 05, 2019 at 12:06:32AM +0800, Wei Yang wrote:
> >> After this change, kernel build test reduces 20% anon_vma allocation.
> >
> >But does it have any effect on elapsed time or peak memory consumption?
> 
> Do the same kernel build test and record time:
> 
> 
> Origin
> 
> real	2m50.467s
> user	17m52.002s
> sys	1m51.953s    
> 
> real	2m48.662s
> user	17m55.464s
> sys	1m50.553s    
> 
> real	2m51.143s
> user	17m59.687s
> sys	1m53.600s    
> 
> 
> Patched
> 
> real	2m43.733s
> user	17m25.705s
> sys	1m41.791s    
> 
> real	2m47.146s
> user	17m47.451s
> sys	1m43.474s    
> 
> real	2m45.763s
> user	17m38.230s
> sys	1m42.102s    
> 
> 
> For time in sys, it reduced 8.5%.

That's compelling!
Wei Yang Oct. 5, 2019, 12:35 p.m. UTC | #9
On Fri, Oct 04, 2019 at 06:15:54PM -0700, Matthew Wilcox wrote:
>On Sat, Oct 05, 2019 at 07:48:45AM +0800, Wei Yang wrote:
>> On Fri, Oct 04, 2019 at 09:11:20AM -0700, Matthew Wilcox wrote:
>> >On Sat, Oct 05, 2019 at 12:06:32AM +0800, Wei Yang wrote:
>> >> After this change, kernel build test reduces 20% anon_vma allocation.
>> >
>> >But does it have any effect on elapsed time or peak memory consumption?
>> 
>> Do the same kernel build test and record time:
>> 
>> 
>> Origin
>> 
>> real	2m50.467s
>> user	17m52.002s
>> sys	1m51.953s    
>> 
>> real	2m48.662s
>> user	17m55.464s
>> sys	1m50.553s    
>> 
>> real	2m51.143s
>> user	17m59.687s
>> sys	1m53.600s    
>> 
>> 
>> Patched
>> 
>> real	2m43.733s
>> user	17m25.705s
>> sys	1m41.791s    
>> 
>> real	2m47.146s
>> user	17m47.451s
>> sys	1m43.474s    
>> 
>> real	2m45.763s
>> user	17m38.230s
>> sys	1m42.102s    
>> 
>> 
>> For time in sys, it reduced 8.5%.
>
>That's compelling!

Yep, looks good :-)
diff mbox series

Patch

diff --git a/mm/rmap.c b/mm/rmap.c
index d9a23bb773bf..12f6c3d7fd9d 100644
--- a/mm/rmap.c
+++ b/mm/rmap.c
@@ -262,6 +262,17 @@  int anon_vma_clone(struct vm_area_struct *dst, struct vm_area_struct *src)
 {
 	struct anon_vma_chain *avc, *pavc;
 	struct anon_vma *root = NULL;
+	struct vm_area_struct *prev = dst->vm_prev, *pprev = src->vm_prev;
+
+	/*
+	 * If parent share anon_vma with its vm_prev, keep this sharing in in
+	 * child.
+	 *
+	 * 1. Parent has vm_prev, which implies we have vm_prev.
+	 * 2. Parent and its vm_prev have the same anon_vma.
+	 */
+	if (pprev && pprev->anon_vma == src->anon_vma)
+		dst->anon_vma = prev->anon_vma;
 
 	list_for_each_entry_reverse(pavc, &src->anon_vma_chain, same_vma) {
 		struct anon_vma *anon_vma;