Message ID | 20190806081123.22334-1-richardw.yang@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/mmap.c: refine data locality of find_vma_prev | expand |
On 8/6/19 10:11 AM, Wei Yang wrote: > When addr is out of the range of the whole rb_tree, pprev will points to > the biggest node. find_vma_prev gets is by going through the right most s/biggest/last/ ? or right-most? > node of the tree. > > Since only the last node is the one it is looking for, it is not > necessary to assign pprev to those middle stage nodes. By assigning > pprev to the last node directly, it tries to improve the function > locality a little. In the end, it will always write to the cacheline of pprev. The caller has most likely have it on stack, so it's already hot, and there's no other CPU stealing it. So I don't understand where the improved locality comes from. The compiler can also optimize the patched code so the assembly is identical to the previous code, or vice versa. Did you check for differences? The previous code is somewhat more obvious to me, so unless I'm missing something, readability and less churn suggests to not change. > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > --- > mm/mmap.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 7e8c3e8ae75f..284bc7e51f9c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, > *pprev = vma->vm_prev; > } else { > struct rb_node *rb_node = mm->mm_rb.rb_node; > - *pprev = NULL; > - while (rb_node) { > - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); > + while (rb_node && rb_node->rb_right) > rb_node = rb_node->rb_right; > - } > + *pprev = rb_node ? NULL > + : rb_entry(rb_node, struct vm_area_struct, vm_rb); > } > return vma; > } >
On Tue, Aug 06, 2019 at 04:11:23PM +0800, Wei Yang wrote: > When addr is out of the range of the whole rb_tree, pprev will points to > the biggest node. find_vma_prev gets is by going through the right most > node of the tree. > > Since only the last node is the one it is looking for, it is not > necessary to assign pprev to those middle stage nodes. By assigning > pprev to the last node directly, it tries to improve the function > locality a little. > > Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> > --- > mm/mmap.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/mm/mmap.c b/mm/mmap.c > index 7e8c3e8ae75f..284bc7e51f9c 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, > *pprev = vma->vm_prev; > } else { > struct rb_node *rb_node = mm->mm_rb.rb_node; > - *pprev = NULL; > - while (rb_node) { > - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); > + while (rb_node && rb_node->rb_right) > rb_node = rb_node->rb_right; > - } > + *pprev = rb_node ? NULL > + : rb_entry(rb_node, struct vm_area_struct, vm_rb); Can rb_node ever be NULL? assuming mm->mm_rb.rb_node is not NULL when we enter here Balbir Singh
On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: >On 8/6/19 10:11 AM, Wei Yang wrote: >> When addr is out of the range of the whole rb_tree, pprev will points to >> the biggest node. find_vma_prev gets is by going through the right most > >s/biggest/last/ ? or right-most? > >> node of the tree. >> >> Since only the last node is the one it is looking for, it is not >> necessary to assign pprev to those middle stage nodes. By assigning >> pprev to the last node directly, it tries to improve the function >> locality a little. > >In the end, it will always write to the cacheline of pprev. The caller has most >likely have it on stack, so it's already hot, and there's no other CPU stealing >it. So I don't understand where the improved locality comes from. The compiler >can also optimize the patched code so the assembly is identical to the previous >code, or vice versa. Did you check for differences? Vlastimil Thanks for your comment. I believe you get a point. I may not use the word locality. This patch tries to reduce some unnecessary assignment of pprev. Original code would assign the value on each node during iteration, this is what I want to reduce. The generated code looks different from my side. Would you mind sharing me how you compare the generated code? > >The previous code is somewhat more obvious to me, so unless I'm missing >something, readability and less churn suggests to not change. > >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> --- >> mm/mmap.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 7e8c3e8ae75f..284bc7e51f9c 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, >> *pprev = vma->vm_prev; >> } else { >> struct rb_node *rb_node = mm->mm_rb.rb_node; >> - *pprev = NULL; >> - while (rb_node) { >> - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); >> + while (rb_node && rb_node->rb_right) >> rb_node = rb_node->rb_right; >> - } >> + *pprev = rb_node ? NULL >> + : rb_entry(rb_node, struct vm_area_struct, vm_rb); >> } >> return vma; >> } >>
On Tue, Aug 06, 2019 at 10:58:22AM +0000, Balbir Singh wrote: >On Tue, Aug 06, 2019 at 04:11:23PM +0800, Wei Yang wrote: >> When addr is out of the range of the whole rb_tree, pprev will points to >> the biggest node. find_vma_prev gets is by going through the right most >> node of the tree. >> >> Since only the last node is the one it is looking for, it is not >> necessary to assign pprev to those middle stage nodes. By assigning >> pprev to the last node directly, it tries to improve the function >> locality a little. >> >> Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> >> --- >> mm/mmap.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 7e8c3e8ae75f..284bc7e51f9c 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, >> *pprev = vma->vm_prev; >> } else { >> struct rb_node *rb_node = mm->mm_rb.rb_node; >> - *pprev = NULL; >> - while (rb_node) { >> - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); >> + while (rb_node && rb_node->rb_right) >> rb_node = rb_node->rb_right; >> - } >> + *pprev = rb_node ? NULL >> + : rb_entry(rb_node, struct vm_area_struct, vm_rb); > >Can rb_node ever be NULL? assuming mm->mm_rb.rb_node is not NULL when we >enter here > My bad, it should be *pprev = !rb_node ? NULL : rb_entry(rb_node, struct vm_area_struct, vm_rb); Thanks >Balbir Singh
On Wed 07-08-19 08:31:09, Wei Yang wrote: > On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: > >On 8/6/19 10:11 AM, Wei Yang wrote: > >> When addr is out of the range of the whole rb_tree, pprev will points to > >> the biggest node. find_vma_prev gets is by going through the right most > > > >s/biggest/last/ ? or right-most? > > > >> node of the tree. > >> > >> Since only the last node is the one it is looking for, it is not > >> necessary to assign pprev to those middle stage nodes. By assigning > >> pprev to the last node directly, it tries to improve the function > >> locality a little. > > > >In the end, it will always write to the cacheline of pprev. The caller has most > >likely have it on stack, so it's already hot, and there's no other CPU stealing > >it. So I don't understand where the improved locality comes from. The compiler > >can also optimize the patched code so the assembly is identical to the previous > >code, or vice versa. Did you check for differences? > > Vlastimil > > Thanks for your comment. > > I believe you get a point. I may not use the word locality. This patch tries > to reduce some unnecessary assignment of pprev. > > Original code would assign the value on each node during iteration, this is > what I want to reduce. Is there any measurable difference (on micro benchmarks or regular workloads)?
On Wed, Aug 07, 2019 at 09:51:01AM +0200, Michal Hocko wrote: >On Wed 07-08-19 08:31:09, Wei Yang wrote: >> On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: >> >On 8/6/19 10:11 AM, Wei Yang wrote: >> >> When addr is out of the range of the whole rb_tree, pprev will points to >> >> the biggest node. find_vma_prev gets is by going through the right most >> > >> >s/biggest/last/ ? or right-most? >> > >> >> node of the tree. >> >> >> >> Since only the last node is the one it is looking for, it is not >> >> necessary to assign pprev to those middle stage nodes. By assigning >> >> pprev to the last node directly, it tries to improve the function >> >> locality a little. >> > >> >In the end, it will always write to the cacheline of pprev. The caller has most >> >likely have it on stack, so it's already hot, and there's no other CPU stealing >> >it. So I don't understand where the improved locality comes from. The compiler >> >can also optimize the patched code so the assembly is identical to the previous >> >code, or vice versa. Did you check for differences? >> >> Vlastimil >> >> Thanks for your comment. >> >> I believe you get a point. I may not use the word locality. This patch tries >> to reduce some unnecessary assignment of pprev. >> >> Original code would assign the value on each node during iteration, this is >> what I want to reduce. > >Is there any measurable difference (on micro benchmarks or regular >workloads)? I wrote a test case to compare these two methods, but not find visible difference in run time. While I found we may leverage rb_last to refine the code a little. @@ -2270,12 +2270,9 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, if (vma) { *pprev = vma->vm_prev; } else { - struct rb_node *rb_node = mm->mm_rb.rb_node; - *pprev = NULL; - while (rb_node) { - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); - rb_node = rb_node->rb_right; - } + struct rb_node *rb_node = rb_last(&mm->mm_rb); + *pprev = !rb_node ? NULL : + rb_entry(rb_node, struct vm_area_struct, vm_rb); } return vma; Not sure this style would help a little in understanding the code? >-- >Michal Hocko >SUSE Labs
On Thu 08-08-19 11:26:38, Wei Yang wrote: > On Wed, Aug 07, 2019 at 09:51:01AM +0200, Michal Hocko wrote: > >On Wed 07-08-19 08:31:09, Wei Yang wrote: > >> On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: > >> >On 8/6/19 10:11 AM, Wei Yang wrote: > >> >> When addr is out of the range of the whole rb_tree, pprev will points to > >> >> the biggest node. find_vma_prev gets is by going through the right most > >> > > >> >s/biggest/last/ ? or right-most? > >> > > >> >> node of the tree. > >> >> > >> >> Since only the last node is the one it is looking for, it is not > >> >> necessary to assign pprev to those middle stage nodes. By assigning > >> >> pprev to the last node directly, it tries to improve the function > >> >> locality a little. > >> > > >> >In the end, it will always write to the cacheline of pprev. The caller has most > >> >likely have it on stack, so it's already hot, and there's no other CPU stealing > >> >it. So I don't understand where the improved locality comes from. The compiler > >> >can also optimize the patched code so the assembly is identical to the previous > >> >code, or vice versa. Did you check for differences? > >> > >> Vlastimil > >> > >> Thanks for your comment. > >> > >> I believe you get a point. I may not use the word locality. This patch tries > >> to reduce some unnecessary assignment of pprev. > >> > >> Original code would assign the value on each node during iteration, this is > >> what I want to reduce. > > > >Is there any measurable difference (on micro benchmarks or regular > >workloads)? > > I wrote a test case to compare these two methods, but not find visible > difference in run time. What is the point in changing this code if it doesn't lead to any measurable improvement?
On Thu, Aug 08, 2019 at 08:02:10AM +0200, Michal Hocko wrote: >On Thu 08-08-19 11:26:38, Wei Yang wrote: >> On Wed, Aug 07, 2019 at 09:51:01AM +0200, Michal Hocko wrote: >> >On Wed 07-08-19 08:31:09, Wei Yang wrote: >> >> On Tue, Aug 06, 2019 at 11:29:52AM +0200, Vlastimil Babka wrote: >> >> >On 8/6/19 10:11 AM, Wei Yang wrote: >> >> >> When addr is out of the range of the whole rb_tree, pprev will points to >> >> >> the biggest node. find_vma_prev gets is by going through the right most >> >> > >> >> >s/biggest/last/ ? or right-most? >> >> > >> >> >> node of the tree. >> >> >> >> >> >> Since only the last node is the one it is looking for, it is not >> >> >> necessary to assign pprev to those middle stage nodes. By assigning >> >> >> pprev to the last node directly, it tries to improve the function >> >> >> locality a little. >> >> > >> >> >In the end, it will always write to the cacheline of pprev. The caller has most >> >> >likely have it on stack, so it's already hot, and there's no other CPU stealing >> >> >it. So I don't understand where the improved locality comes from. The compiler >> >> >can also optimize the patched code so the assembly is identical to the previous >> >> >code, or vice versa. Did you check for differences? >> >> >> >> Vlastimil >> >> >> >> Thanks for your comment. >> >> >> >> I believe you get a point. I may not use the word locality. This patch tries >> >> to reduce some unnecessary assignment of pprev. >> >> >> >> Original code would assign the value on each node during iteration, this is >> >> what I want to reduce. >> > >> >Is there any measurable difference (on micro benchmarks or regular >> >workloads)? >> >> I wrote a test case to compare these two methods, but not find visible >> difference in run time. > >What is the point in changing this code if it doesn't lead to any >measurable improvement? You are right. >-- >Michal Hocko >SUSE Labs
On 8/8/19 5:26 AM, Wei Yang wrote: > > @@ -2270,12 +2270,9 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, > if (vma) { > *pprev = vma->vm_prev; > } else { > - struct rb_node *rb_node = mm->mm_rb.rb_node; > - *pprev = NULL; > - while (rb_node) { > - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); > - rb_node = rb_node->rb_right; > - } > + struct rb_node *rb_node = rb_last(&mm->mm_rb); > + *pprev = !rb_node ? NULL : > + rb_entry(rb_node, struct vm_area_struct, vm_rb); > } > return vma; > > Not sure this style would help a little in understanding the code? Yeah using rb_last() would be nicer than basically repeating its implementation, so it's fine as a cleanup without performance implications. >> -- >> Michal Hocko >> SUSE Labs >
On Thu, Aug 08, 2019 at 10:49:29AM +0200, Vlastimil Babka wrote: >On 8/8/19 5:26 AM, Wei Yang wrote: >> >> @@ -2270,12 +2270,9 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, >> if (vma) { >> *pprev = vma->vm_prev; >> } else { >> - struct rb_node *rb_node = mm->mm_rb.rb_node; >> - *pprev = NULL; >> - while (rb_node) { >> - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); >> - rb_node = rb_node->rb_right; >> - } >> + struct rb_node *rb_node = rb_last(&mm->mm_rb); >> + *pprev = !rb_node ? NULL : >> + rb_entry(rb_node, struct vm_area_struct, vm_rb); >> } >> return vma; >> >> Not sure this style would help a little in understanding the code? > >Yeah using rb_last() would be nicer than basically repeating its >implementation, so it's fine as a cleanup without performance implications. > Thanks, I would send this version with proper change log. >>> -- >>> Michal Hocko >>> SUSE Labs >>
diff --git a/mm/mmap.c b/mm/mmap.c index 7e8c3e8ae75f..284bc7e51f9c 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -2271,11 +2271,10 @@ find_vma_prev(struct mm_struct *mm, unsigned long addr, *pprev = vma->vm_prev; } else { struct rb_node *rb_node = mm->mm_rb.rb_node; - *pprev = NULL; - while (rb_node) { - *pprev = rb_entry(rb_node, struct vm_area_struct, vm_rb); + while (rb_node && rb_node->rb_right) rb_node = rb_node->rb_right; - } + *pprev = rb_node ? NULL + : rb_entry(rb_node, struct vm_area_struct, vm_rb); } return vma; }
When addr is out of the range of the whole rb_tree, pprev will points to the biggest node. find_vma_prev gets is by going through the right most node of the tree. Since only the last node is the one it is looking for, it is not necessary to assign pprev to those middle stage nodes. By assigning pprev to the last node directly, it tries to improve the function locality a little. Signed-off-by: Wei Yang <richardw.yang@linux.intel.com> --- mm/mmap.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)