diff mbox

[6/7] Btrfs: __tree_mod_log_insert decrease max compare count

Message ID 20170607005844.2078-7-nefelim4ag@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Timofey Titovets June 7, 2017, 12:58 a.m. UTC
In worst case code do 4 comparison,
just add some new checks to switch check branch faster
now in worst case code do 3 comparison

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
---
 fs/btrfs/ctree.c | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

Comments

Filipe Manana June 7, 2017, 9:45 a.m. UTC | #1
On Wed, Jun 7, 2017 at 1:58 AM, Timofey Titovets <nefelim4ag@gmail.com> wrote:
> In worst case code do 4 comparison,
> just add some new checks to switch check branch faster
> now in worst case code do 3 comparison

Some comments below.

>
> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> ---
>  fs/btrfs/ctree.c | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a3a75f1de..318379531 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -460,15 +460,17 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
>         while (*new) {
>                 cur = rb_entry(*new, struct tree_mod_elem, node);
>                 parent = *new;
> -               if (cur->logical < tm->logical)
> -                       new = &((*new)->rb_left);
> -               else if (cur->logical > tm->logical)
> -                       new = &((*new)->rb_right);
> -               else if (cur->seq < tm->seq)
> -                       new = &((*new)->rb_left);
> -               else if (cur->seq > tm->seq)
> -                       new = &((*new)->rb_right);
> -               else
> +               if (cur->logical != tm->logical) {
> +                       if (cur->logical < tm->logical)
> +                               new = &((*new)->rb_left);

So now when cur->logical is less then tm->logical, we do 2 comparisons
instead of 1.
Is this case much less common always? If it were it could be a good change.
What guarantees that this change will give better performance for
common workloads? What guarantees you the example I just gave isn't
typical for most/many common workloads?

And most importantly, did you actually do some benchmarks that justify
this change? Is this a hot code path that would benefit from such
changes?
(Since it's the tree mod log, seldom used, I really doubt it).

Same comments apply to all the other similar patches you have sent.

thanks

> +                       else
> +                               new = &((*new)->rb_right);
> +               } else if (cur->seq != tm->seq) {
> +                       if (cur->seq < tm->seq)
> +                               new = &((*new)->rb_left);
> +                       else
> +                               new = &((*new)->rb_right);
> +               } else
>                         return -EEXIST;
>         }
>
> --
> 2.13.0
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Timofey Titovets June 7, 2017, 11:01 a.m. UTC | #2
2017-06-07 12:45 GMT+03:00 Filipe Manana <fdmanana@gmail.com>:
> On Wed, Jun 7, 2017 at 1:58 AM, Timofey Titovets <nefelim4ag@gmail.com> wrote:
>> In worst case code do 4 comparison,
>> just add some new checks to switch check branch faster
>> now in worst case code do 3 comparison
>
> Some comments below.
>
>>
>> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
>> ---
>>  fs/btrfs/ctree.c | 20 +++++++++++---------
>>  1 file changed, 11 insertions(+), 9 deletions(-)
>>
>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>> index a3a75f1de..318379531 100644
>> --- a/fs/btrfs/ctree.c
>> +++ b/fs/btrfs/ctree.c
>> @@ -460,15 +460,17 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
>>         while (*new) {
>>                 cur = rb_entry(*new, struct tree_mod_elem, node);
>>                 parent = *new;
>> -               if (cur->logical < tm->logical)
>> -                       new = &((*new)->rb_left);
>> -               else if (cur->logical > tm->logical)
>> -                       new = &((*new)->rb_right);
>> -               else if (cur->seq < tm->seq)
>> -                       new = &((*new)->rb_left);
>> -               else if (cur->seq > tm->seq)
>> -                       new = &((*new)->rb_right);
>> -               else
>> +               if (cur->logical != tm->logical) {
>> +                       if (cur->logical < tm->logical)
>> +                               new = &((*new)->rb_left);
>
> So now when cur->logical is less then tm->logical, we do 2 comparisons
> instead of 1.
> Is this case much less common always? If it were it could be a good change.
> What guarantees that this change will give better performance for
> common workloads? What guarantees you the example I just gave isn't
> typical for most/many common workloads?
>
> And most importantly, did you actually do some benchmarks that justify
> this change? Is this a hot code path that would benefit from such
> changes?
> (Since it's the tree mod log, seldom used, I really doubt it).
>
> Same comments apply to all the other similar patches you have sent.
>
> thanks
>

Nope, i just lack of expirience to do such benchmarks,
same for understand all btrfs code, it's difficult to say that path is hot/cold.

I just did reread all btrfs code, analize that i understand and trying
fix places that looks questionably.

In most cases i can explain why this can help (IMHO):
01 - __compare_inode_defrag - we compare inodes, by my expirience
in most distros and setups btrfs have 1-2 subvolumes (root_ids), so in
most time root_id1 == root_id2
02 - backref_comp - same
03 - ref_node_cmp - same
04 - ref_tree_add - just a cleanup, compiller do the same
https://godbolt.org/g/Ww93ws
05 - add_all_parents - just a cleanup, compiller do the same
https://godbolt.org/g/srSGeW
07 - end_workqueue_bio - if i understand correctly this code executed
on every write,
so by convertion to switch case will save several cpu cycles on every write

For that function (06 -  __tree_mod_log_insert):
i do the same as for 01-03 "by inertia" - as i just not know all code,
i just assume that as this fuction work with rb tree, values will be
equally distributed

Thanks
David Sterba June 13, 2017, 2:24 p.m. UTC | #3
On Wed, Jun 07, 2017 at 02:01:46PM +0300, Timofey Titovets wrote:
> 2017-06-07 12:45 GMT+03:00 Filipe Manana <fdmanana@gmail.com>:
> > On Wed, Jun 7, 2017 at 1:58 AM, Timofey Titovets <nefelim4ag@gmail.com> wrote:
> >> In worst case code do 4 comparison,
> >> just add some new checks to switch check branch faster
> >> now in worst case code do 3 comparison
> >
> > Some comments below.
> >
> >>
> >> Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
> >> ---
> >>  fs/btrfs/ctree.c | 20 +++++++++++---------
> >>  1 file changed, 11 insertions(+), 9 deletions(-)
> >>
> >> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >> index a3a75f1de..318379531 100644
> >> --- a/fs/btrfs/ctree.c
> >> +++ b/fs/btrfs/ctree.c
> >> @@ -460,15 +460,17 @@ __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
> >>         while (*new) {
> >>                 cur = rb_entry(*new, struct tree_mod_elem, node);
> >>                 parent = *new;
> >> -               if (cur->logical < tm->logical)
> >> -                       new = &((*new)->rb_left);
> >> -               else if (cur->logical > tm->logical)
> >> -                       new = &((*new)->rb_right);
> >> -               else if (cur->seq < tm->seq)
> >> -                       new = &((*new)->rb_left);
> >> -               else if (cur->seq > tm->seq)
> >> -                       new = &((*new)->rb_right);
> >> -               else
> >> +               if (cur->logical != tm->logical) {
> >> +                       if (cur->logical < tm->logical)
> >> +                               new = &((*new)->rb_left);
> >
> > So now when cur->logical is less then tm->logical, we do 2 comparisons
> > instead of 1.
> > Is this case much less common always? If it were it could be a good change.
> > What guarantees that this change will give better performance for
> > common workloads? What guarantees you the example I just gave isn't
> > typical for most/many common workloads?
> >
> > And most importantly, did you actually do some benchmarks that justify
> > this change? Is this a hot code path that would benefit from such
> > changes?
> > (Since it's the tree mod log, seldom used, I really doubt it).
> >
> > Same comments apply to all the other similar patches you have sent.
> 
> Nope, i just lack of expirience to do such benchmarks,
> same for understand all btrfs code, it's difficult to say that path is hot/cold.

So if you call the patchset cleanups and then claim some
performance-related improvements, you can expect to be called out to
provide numbers or an analysis.

I've checked the resulting assembly of "Btrfs: backref_comp decrease max
compare count" that does similar conversion of the sequence of ifs.
Surprise surprise, the assembly does not differ regarding number of
branches. This at mininimum can be done before any profiling to see if
this makes sense.

If the assembly stays the same, the code readability may be preferred,
which is one of the points of cleanup patches. In that case I prefer the
current code over your change, but it may be just personal preference
and because I'm used to reading the "linear ifs" pattern.

> I just did reread all btrfs code, analize that i understand and trying
> fix places that looks questionably.
> 
> In most cases i can explain why this can help (IMHO):
> 01 - __compare_inode_defrag - we compare inodes, by my expirience
> in most distros and setups btrfs have 1-2 subvolumes (root_ids), so in
> most time root_id1 == root_id2
> 02 - backref_comp - same
> 03 - ref_node_cmp - same
> 04 - ref_tree_add - just a cleanup, compiller do the same
> https://godbolt.org/g/Ww93ws
> 05 - add_all_parents - just a cleanup, compiller do the same
> https://godbolt.org/g/srSGeW
> 07 - end_workqueue_bio - if i understand correctly this code executed
> on every write,
> so by convertion to switch case will save several cpu cycles on every write
> 
> For that function (06 -  __tree_mod_log_insert):
> i do the same as for 01-03 "by inertia" - as i just not know all code,
> i just assume that as this fuction work with rb tree, values will be
> equally distributed

From my point above, I'll not merge patches 1, 2, 3 and 6. Patches 4 and 5
seem ok.

Patch 7 does not result in the equivalent code, as reported by the
kbuild robot. But converting the ifs to switch is desired in
end_workqueue_bio, namely the second branch as it's not clear how are
some of the BTRFS_WQ_ENDIO_* values handled.

If you still care, please resend 4 and 5. The changelog does not need to
repeat the code, more describe that the change simplifies the code to
make it more readable.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a3a75f1de..318379531 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -460,15 +460,17 @@  __tree_mod_log_insert(struct btrfs_fs_info *fs_info, struct tree_mod_elem *tm)
 	while (*new) {
 		cur = rb_entry(*new, struct tree_mod_elem, node);
 		parent = *new;
-		if (cur->logical < tm->logical)
-			new = &((*new)->rb_left);
-		else if (cur->logical > tm->logical)
-			new = &((*new)->rb_right);
-		else if (cur->seq < tm->seq)
-			new = &((*new)->rb_left);
-		else if (cur->seq > tm->seq)
-			new = &((*new)->rb_right);
-		else
+		if (cur->logical != tm->logical) {
+			if (cur->logical < tm->logical)
+				new = &((*new)->rb_left);
+			else
+				new = &((*new)->rb_right);
+		} else if (cur->seq != tm->seq) {
+			if (cur->seq < tm->seq)
+				new = &((*new)->rb_left);
+			else
+				new = &((*new)->rb_right);
+		} else
 			return -EEXIST;
 	}