diff mbox series

[1/3] maple_tree: use ma_data_end() in mas_data_end()

Message ID 20240831001053.4751-1-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series [1/3] maple_tree: use ma_data_end() in mas_data_end() | expand

Commit Message

Wei Yang Aug. 31, 2024, 12:10 a.m. UTC
These two function share almost the same code and do the same thing.

Let's just call ma_data_end() in mas_data_end() to reduce duplicate
code.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 lib/maple_tree.c | 22 ++++------------------
 1 file changed, 4 insertions(+), 18 deletions(-)

Comments

Liam R. Howlett Sept. 3, 2024, 4:12 p.m. UTC | #1
* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]:
> These two function share almost the same code and do the same thing.

Please stop trying to optimise code like this.  I don't have time to
verify you are not making things worse and you haven't done any of the
testing required.

I went through with perf to optimise dereferences and static inline
functions, so unless you are prepared to look at the generated code and
actually benchmark, please stop sending patches that I need to verify
like this myself.

> 
> Let's just call ma_data_end() in mas_data_end() to reduce duplicate
> code.
> 
> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> ---
>  lib/maple_tree.c | 22 ++++------------------
>  1 file changed, 4 insertions(+), 18 deletions(-)
> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index b7d747a7938e..85668246f944 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node,
>   */
>  static inline unsigned char mas_data_end(struct ma_state *mas)
>  {
> -	enum maple_type type;
> -	struct maple_node *node;
> -	unsigned char offset;
> -	unsigned long *pivots;
> -
> -	type = mte_node_type(mas->node);
> -	node = mas_mn(mas);
> -	if (type == maple_arange_64)
> -		return ma_meta_end(node, type);
> +	enum maple_type type = mte_node_type(mas->node);
> +	struct maple_node *node = mas_mn(mas);
> +	unsigned long *pivots = ma_pivots(node, type);
>  
> -	pivots = ma_pivots(node, type);
>  	if (unlikely(ma_dead_node(node)))
>  		return 0;
>  
> -	offset = mt_pivots[type] - 1;
> -	if (likely(!pivots[offset]))
> -		return ma_meta_end(node, type);
> -
> -	if (likely(pivots[offset] == mas->max))
> -		return offset;
> -
> -	return mt_pivots[type];
> +	return ma_data_end(node, type, pivots, mas->max);
>  }
>  
>  /*
> -- 
> 2.34.1
> 
>
Wei Yang Sept. 4, 2024, 12:15 a.m. UTC | #2
On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]:
>> These two function share almost the same code and do the same thing.
>
>Please stop trying to optimise code like this.  I don't have time to
>verify you are not making things worse and you haven't done any of the
>testing required.
>
>I went through with perf to optimise dereferences and static inline
>functions, so unless you are prepared to look at the generated code and
>actually benchmark, please stop sending patches that I need to verify
>like this myself.
>

Would you mind letting me know how could I verify the generated code and
benchmark? What you expect to do?

For example this one, these two functions share the same code. What's your
concern for this? The inline function expansion?

>> 
>> Let's just call ma_data_end() in mas_data_end() to reduce duplicate
>> code.
>> 
>> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
>> ---
>>  lib/maple_tree.c | 22 ++++------------------
>>  1 file changed, 4 insertions(+), 18 deletions(-)
>> 
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index b7d747a7938e..85668246f944 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node,
>>   */
>>  static inline unsigned char mas_data_end(struct ma_state *mas)
>>  {
>> -	enum maple_type type;
>> -	struct maple_node *node;
>> -	unsigned char offset;
>> -	unsigned long *pivots;
>> -
>> -	type = mte_node_type(mas->node);
>> -	node = mas_mn(mas);
>> -	if (type == maple_arange_64)
>> -		return ma_meta_end(node, type);
>> +	enum maple_type type = mte_node_type(mas->node);
>> +	struct maple_node *node = mas_mn(mas);
>> +	unsigned long *pivots = ma_pivots(node, type);
>>  
>> -	pivots = ma_pivots(node, type);
>>  	if (unlikely(ma_dead_node(node)))
>>  		return 0;
>>  
>> -	offset = mt_pivots[type] - 1;
>> -	if (likely(!pivots[offset]))
>> -		return ma_meta_end(node, type);
>> -
>> -	if (likely(pivots[offset] == mas->max))
>> -		return offset;
>> -
>> -	return mt_pivots[type];
>> +	return ma_data_end(node, type, pivots, mas->max);
>>  }
>>  
>>  /*
>> -- 
>> 2.34.1
>> 
>>
Liam R. Howlett Sept. 4, 2024, 2:25 a.m. UTC | #3
* Wei Yang <richard.weiyang@gmail.com> [240903 20:15]:
> On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote:
> >* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]:
> >> These two function share almost the same code and do the same thing.
> >
> >Please stop trying to optimise code like this.  I don't have time to
> >verify you are not making things worse and you haven't done any of the
> >testing required.
> >
> >I went through with perf to optimise dereferences and static inline
> >functions, so unless you are prepared to look at the generated code and
> >actually benchmark, please stop sending patches that I need to verify
> >like this myself.
> >
> 
> Would you mind letting me know how could I verify the generated code and
> benchmark? What you expect to do?

There are a number of benchmarks in the test code that will run specific
tasks in iterations to check that there is not a regression.  Depending
on what you change, you have to run the right one.  Usually I write one
if one doesn't exist.

You can also use perf record and see what happens on specific tests.

This function is all over the place, so I really don't know which one to
run, and I don't have time to write a test for your changes - and I
don't think this is worth it.

> 
> For example this one, these two functions share the same code.

Do they?

You have removed the fast path for maple_arange_64 using the metadata
end before getting the pivots, which means on basically all vma walks we
will be adding instructions to the walking of the tree (get the pivots,
check if the node is dead, check if the pivots pointer is null).  You
have also added a check for if (!pivots), which is essentially checking
if a dead node was hit - which is already checked in the mas_data_end().
These two checks are the reasons I left both copies of this function in
place. I am looking to reduce the execution time, not decrease the line
count of the file.

So, the last 8 lines of both functions are the same. And please don't
submit a patch that adds an inline function that does the last 8 lines
to reduce duplication.

> What's your
> concern for this? The inline function expansion?

Your clean up is compressing setting variables to the same line, which
is a bad change.  It is better to have verbose code where it won't make
a difference in what is compiled.  And certainly not worth adding after
the fact.

The inline of the mas_data_end() function depends on the expansion
happening by the compiler (which might change based on the version or if
it's gcc vs clang), sure that's a bit of a concern.

The biggest annoying thing about this whole patch series (without a
cover letter) is that it isn't doing anything for fixing, helping, or
adding functionality.  I'm a big fan of forward progress and this isn't
it.

It is only changing code for the sake of changing code.  And it looks
like it will be slower, or the same speed if we are lucky.  I have to
take time to verify things aren't slower or add subtle issues (maybe an
RCU race) because the code looked similar.  It's just not worth it.

> 
> >> 
> >> Let's just call ma_data_end() in mas_data_end() to reduce duplicate
> >> code.
> >> 
> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
> >> ---
> >>  lib/maple_tree.c | 22 ++++------------------
> >>  1 file changed, 4 insertions(+), 18 deletions(-)
> >> 
> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> >> index b7d747a7938e..85668246f944 100644
> >> --- a/lib/maple_tree.c
> >> +++ b/lib/maple_tree.c
> >> @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node,
> >>   */
> >>  static inline unsigned char mas_data_end(struct ma_state *mas)
> >>  {
> >> -	enum maple_type type;
> >> -	struct maple_node *node;
> >> -	unsigned char offset;
> >> -	unsigned long *pivots;
> >> -
> >> -	type = mte_node_type(mas->node);
> >> -	node = mas_mn(mas);
> >> -	if (type == maple_arange_64)
> >> -		return ma_meta_end(node, type);
> >> +	enum maple_type type = mte_node_type(mas->node);
> >> +	struct maple_node *node = mas_mn(mas);
> >> +	unsigned long *pivots = ma_pivots(node, type);
> >>  
> >> -	pivots = ma_pivots(node, type);
> >>  	if (unlikely(ma_dead_node(node)))
> >>  		return 0;
> >>  
> >> -	offset = mt_pivots[type] - 1;
> >> -	if (likely(!pivots[offset]))
> >> -		return ma_meta_end(node, type);
> >> -
> >> -	if (likely(pivots[offset] == mas->max))
> >> -		return offset;
> >> -
> >> -	return mt_pivots[type];
> >> +	return ma_data_end(node, type, pivots, mas->max);
> >>  }
> >>  
> >>  /*
> >> -- 
> >> 2.34.1
> >> 
> >> 
> 
> -- 
> Wei Yang
> Help you, Help me
Wei Yang Sept. 4, 2024, 7:58 a.m. UTC | #4
On Tue, Sep 03, 2024 at 10:25:10PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [240903 20:15]:
>> On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote:
>> >* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]:
>> >> These two function share almost the same code and do the same thing.
>> >
>> >Please stop trying to optimise code like this.  I don't have time to
>> >verify you are not making things worse and you haven't done any of the
>> >testing required.
>> >
>> >I went through with perf to optimise dereferences and static inline
>> >functions, so unless you are prepared to look at the generated code and
>> >actually benchmark, please stop sending patches that I need to verify
>> >like this myself.
>> >
>> 
>> Would you mind letting me know how could I verify the generated code and
>> benchmark? What you expect to do?
>

Liam,

Thanks for your patient reply.

>There are a number of benchmarks in the test code that will run specific
>tasks in iterations to check that there is not a regression.  Depending
>on what you change, you have to run the right one.  Usually I write one
>if one doesn't exist.

Not familiar with benchmarks. Would you mind naming someone? or the ones you
have written? So that I can learn to test or write one later.

>
>You can also use perf record and see what happens on specific tests.
>
>This function is all over the place, so I really don't know which one to
>run, and I don't have time to write a test for your changes - and I
>don't think this is worth it.
>
>> 
>> For example this one, these two functions share the same code.
>
>Do they?
>
>You have removed the fast path for maple_arange_64 using the metadata
>end before getting the pivots, which means on basically all vma walks we
>will be adding instructions to the walking of the tree (get the pivots,
>check if the node is dead, check if the pivots pointer is null).  You
>have also added a check for if (!pivots), which is essentially checking
>if a dead node was hit - which is already checked in the mas_data_end().
>These two checks are the reasons I left both copies of this function in
>place. I am looking to reduce the execution time, not decrease the line
>count of the file.
>

You are right, I should pay more attention to the fast path opt.

>So, the last 8 lines of both functions are the same. And please don't
>submit a patch that adds an inline function that does the last 8 lines
>to reduce duplication.
>

Ok, I will not try to call an inline function just to reduce some similar
code.

>> What's your
>> concern for this? The inline function expansion?
>
>Your clean up is compressing setting variables to the same line, which
>is a bad change.  It is better to have verbose code where it won't make
>a difference in what is compiled.  And certainly not worth adding after
>the fact.
>
>The inline of the mas_data_end() function depends on the expansion
>happening by the compiler (which might change based on the version or if
>it's gcc vs clang), sure that's a bit of a concern.
>
>The biggest annoying thing about this whole patch series (without a
>cover letter) is that it isn't doing anything for fixing, helping, or
>adding functionality.  I'm a big fan of forward progress and this isn't
>it.
>
>It is only changing code for the sake of changing code.  And it looks
>like it will be slower, or the same speed if we are lucky.  I have to
>take time to verify things aren't slower or add subtle issues (maybe an
>RCU race) because the code looked similar.  It's just not worth it.
>

I am trying to make the code more easy to read, but seems not helping.

BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access
the first parent, parent's type and its gap twice. Once in mas_update_gap()
and once in mas_parent_gap().

Do you think it worth a change to reduce one?
Wei Yang Sept. 4, 2024, 2:53 p.m. UTC | #5
On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote:
[...]
>>It is only changing code for the sake of changing code.  And it looks
>>like it will be slower, or the same speed if we are lucky.  I have to
>>take time to verify things aren't slower or add subtle issues (maybe an
>>RCU race) because the code looked similar.  It's just not worth it.
>>
>
>I am trying to make the code more easy to read, but seems not helping.
>
>BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access
>the first parent, parent's type and its gap twice. Once in mas_update_gap()
>and once in mas_parent_gap().
>
>Do you think it worth a change to reduce one?
>

Liam,

I am trying to understand what kind code change you don't like. 

Is the following change worth?

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index 2b310dd3addf..e331d086eb7c 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas)
 /*
  * mas_parent_gap() - Set the parent gap and any gaps above, as needed
  * @mas: The maple state
- * @offset: The gap offset in the parent to set
  * @new: The new gap value.
  *
  * Set the parent gap then continue to set the gap upwards, using the metadata
  * of the parent to see if it is necessary to check the node above.
  */
-static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
-		unsigned long new)
+static inline void mas_parent_gap(struct ma_state *mas, unsigned long new)
 {
 	unsigned long meta_gap = 0;
 	struct maple_node *pnode;
-	struct maple_enode *penode;
+	struct maple_enode *enode = mas->node;
 	unsigned long *pgaps;
-	unsigned char meta_offset;
+	unsigned char offset, meta_offset;
 	enum maple_type pmt;
 
-	pnode = mte_parent(mas->node);
-	pmt = mas_parent_type(mas, mas->node);
-	penode = mt_mk_node(pnode, pmt);
+ascend:
+	pnode = mte_parent(enode);
+	pmt = mas_parent_type(mas, enode);
+	offset = mte_parent_slot(enode);
 	pgaps = ma_gaps(pnode, pmt);
 
-ascend:
 	MAS_BUG_ON(mas, pmt != maple_arange_64);
 	meta_offset = ma_meta_gap(pnode);
 	meta_gap = pgaps[meta_offset];
 
+	if (pgaps[offset] == new)
+		return;
+
 	pgaps[offset] = new;
 
 	if (meta_gap == new)
@@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
 		return;
 
 	/* Go to the parent node. */
-	pnode = mte_parent(penode);
-	pmt = mas_parent_type(mas, penode);
-	pgaps = ma_gaps(pnode, pmt);
-	offset = mte_parent_slot(penode);
-	penode = mt_mk_node(pnode, pmt);
+	enode = mt_mk_node(pnode, pmt);
 	goto ascend;
 }
 
@@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
  */
 static inline void mas_update_gap(struct ma_state *mas)
 {
-	unsigned char pslot;
-	unsigned long p_gap;
-	unsigned long max_gap;
-
 	if (!mt_is_alloc(mas->tree))
 		return;
 
 	if (mte_is_root(mas->node))
 		return;
 
-	max_gap = mas_max_gap(mas);
-
-	pslot = mte_parent_slot(mas->node);
-	p_gap = ma_gaps(mte_parent(mas->node),
-			mas_parent_type(mas, mas->node))[pslot];
-
-	if (p_gap != max_gap)
-		mas_parent_gap(mas, pslot, max_gap);
+	mas_parent_gap(mas, mas_max_gap(mas));
 }
 
 /*
Liam R. Howlett Sept. 5, 2024, 8:13 p.m. UTC | #6
* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]:
> On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote:
> [...]
> >>It is only changing code for the sake of changing code.  And it looks
> >>like it will be slower, or the same speed if we are lucky.  I have to
> >>take time to verify things aren't slower or add subtle issues (maybe an
> >>RCU race) because the code looked similar.  It's just not worth it.
> >>
> >
> >I am trying to make the code more easy to read, but seems not helping.
> >
> >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access
> >the first parent, parent's type and its gap twice. Once in mas_update_gap()
> >and once in mas_parent_gap().
> >
> >Do you think it worth a change to reduce one?
> >
> 
> Liam,
> 
> I am trying to understand what kind code change you don't like. 

Any change that cleans things up and verifies the performance would be
acceptable, but your previous changes didn't do that so the burden is on
me to verify that your code isn't going to cause a regression.

> 
> Is the following change worth?

Not like it is right now, but this is worth fixing.

Test using the tools/test/radix-tree/maple.c  Look in that file for
BENCH defines at the top, and examples of the benchmarking.

Before you submit anything, run the testing to make sure it all passes.

If you are changing anything for cleanup/optimisation, then write a
benchmark that will test the runtime, add that before your change and
run it in both before/after with the results.

Look also into the perf tool to see what is going on and where your time
is spent, then you can avoid optimising something that's not worth
optimising.

> 
> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
> index 2b310dd3addf..e331d086eb7c 100644
> --- a/lib/maple_tree.c
> +++ b/lib/maple_tree.c
> @@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas)
>  /*
>   * mas_parent_gap() - Set the parent gap and any gaps above, as needed
>   * @mas: The maple state
> - * @offset: The gap offset in the parent to set
>   * @new: The new gap value.
>   *
>   * Set the parent gap then continue to set the gap upwards, using the metadata
>   * of the parent to see if it is necessary to check the node above.
>   */
> -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
> -		unsigned long new)
> +static inline void mas_parent_gap(struct ma_state *mas, unsigned long new)
>  {
>  	unsigned long meta_gap = 0;
>  	struct maple_node *pnode;
> -	struct maple_enode *penode;
> +	struct maple_enode *enode = mas->node;

Set this with the rest of the configuration, before the ascend label
would make more sense and be more clear.

>  	unsigned long *pgaps;
> -	unsigned char meta_offset;
> +	unsigned char offset, meta_offset;

Make this two lines.

>  	enum maple_type pmt;
>  
> -	pnode = mte_parent(mas->node);
> -	pmt = mas_parent_type(mas, mas->node);
> -	penode = mt_mk_node(pnode, pmt);
> +ascend:
> +	pnode = mte_parent(enode);
> +	pmt = mas_parent_type(mas, enode);
> +	offset = mte_parent_slot(enode);
>  	pgaps = ma_gaps(pnode, pmt);
>  
> -ascend:
>  	MAS_BUG_ON(mas, pmt != maple_arange_64);
>  	meta_offset = ma_meta_gap(pnode);
>  	meta_gap = pgaps[meta_offset];
>  
> +	if (pgaps[offset] == new)
> +		return;
> +

So every time we go up a level in the tree, we will now check if the
offset gap is the same as the new gap?  Doesn't this mean every level
now has an extra check that was previously only done for the first
level?  This is an unreasonable trade off.

I don't like what is there today, but I don't see this as a meaningful
improvement and I suspect this extra check is going to cost more than
the extra hot-cache check that exists today.

>  	pgaps[offset] = new;
>  
>  	if (meta_gap == new)
> @@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
>  		return;
>  
>  	/* Go to the parent node. */
> -	pnode = mte_parent(penode);
> -	pmt = mas_parent_type(mas, penode);
> -	pgaps = ma_gaps(pnode, pmt);
> -	offset = mte_parent_slot(penode);
> -	penode = mt_mk_node(pnode, pmt);
> +	enode = mt_mk_node(pnode, pmt);

Essentially, you have replaced the penode with enode and reversed the
order of setting things up.  This seems reasonable, as it reduces the
lines of code.

If you undo this change, you can move the check back outside the loop
and avoid checking it every iteration and avoid the double check you are
trying to avoid.

>  	goto ascend;
>  }
>  
> @@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
>   */
>  static inline void mas_update_gap(struct ma_state *mas)
>  {
> -	unsigned char pslot;
> -	unsigned long p_gap;
> -	unsigned long max_gap;
> -
>  	if (!mt_is_alloc(mas->tree))
>  		return;
>  
>  	if (mte_is_root(mas->node))
>  		return;
>  
> -	max_gap = mas_max_gap(mas);
> -
> -	pslot = mte_parent_slot(mas->node);
> -	p_gap = ma_gaps(mte_parent(mas->node),
> -			mas_parent_type(mas, mas->node))[pslot];
> -
> -	if (p_gap != max_gap)
> -		mas_parent_gap(mas, pslot, max_gap);
> +	mas_parent_gap(mas, mas_max_gap(mas));

This was optimised to skip updating the parent if we don't need to - and
the parent update was called elsewhere in the past.  Since there's not
much here, we can probably delete this function and rename the
mas_parent_gap() with the two tests here for mt_is_alloc() and
mte_is_root().

...

Thanks,
Liam
Wei Yang Sept. 6, 2024, 3:44 a.m. UTC | #7
On Thu, Sep 05, 2024 at 04:13:15PM -0400, Liam R. Howlett wrote:
>* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]:
>> On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote:
>> [...]
>> >>It is only changing code for the sake of changing code.  And it looks
>> >>like it will be slower, or the same speed if we are lucky.  I have to
>> >>take time to verify things aren't slower or add subtle issues (maybe an
>> >>RCU race) because the code looked similar.  It's just not worth it.
>> >>
>> >
>> >I am trying to make the code more easy to read, but seems not helping.
>> >
>> >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access
>> >the first parent, parent's type and its gap twice. Once in mas_update_gap()
>> >and once in mas_parent_gap().
>> >
>> >Do you think it worth a change to reduce one?
>> >
>> 
>> Liam,
>> 
>> I am trying to understand what kind code change you don't like. 
>
>Any change that cleans things up and verifies the performance would be
>acceptable, but your previous changes didn't do that so the burden is on
>me to verify that your code isn't going to cause a regression.
>

Thanks for your reply. It contains much information which I am trying to
digest. If I misunderstand, please let me know.

>> 
>> Is the following change worth?
>
>Not like it is right now, but this is worth fixing.
>
>Test using the tools/test/radix-tree/maple.c  Look in that file for
>BENCH defines at the top, and examples of the benchmarking.
>

I guess I could run them by comment out those #define at the beginning of
lib/test_maple_tree.c?

I have comment out one of it, what I get is:

    TEST STARTING
    
    maple_tree: 80000597 of 80000597 tests passed

My question is how we leverage this test case? From the output itself
I just see all passed, but I am not sure it breaks the benchmark or not.

>Before you submit anything, run the testing to make sure it all passes.
>

Yes, I make and run ./maple for each change.

>If you are changing anything for cleanup/optimisation, then write a
>benchmark that will test the runtime, add that before your change and
>run it in both before/after with the results.
>

I guess is to add a #define BENCH_XXX with related function and call it in
maple_tree_seed(), right?

>Look also into the perf tool to see what is going on and where your time
>is spent, then you can avoid optimising something that's not worth
>optimising.
>

This is use the perf tool on the new added benchmark test?

I am lack of the experience of perf tool. I may need to spend some time to use
it. Usually we begin with 'perf record ./maple', right?

>> 
>> diff --git a/lib/maple_tree.c b/lib/maple_tree.c
>> index 2b310dd3addf..e331d086eb7c 100644
>> --- a/lib/maple_tree.c
>> +++ b/lib/maple_tree.c
>> @@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas)
>>  /*
>>   * mas_parent_gap() - Set the parent gap and any gaps above, as needed
>>   * @mas: The maple state
>> - * @offset: The gap offset in the parent to set
>>   * @new: The new gap value.
>>   *
>>   * Set the parent gap then continue to set the gap upwards, using the metadata
>>   * of the parent to see if it is necessary to check the node above.
>>   */
>> -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
>> -		unsigned long new)
>> +static inline void mas_parent_gap(struct ma_state *mas, unsigned long new)
>>  {
>>  	unsigned long meta_gap = 0;
>>  	struct maple_node *pnode;
>> -	struct maple_enode *penode;
>> +	struct maple_enode *enode = mas->node;
>
>Set this with the rest of the configuration, before the ascend label
>would make more sense and be more clear.
>
>>  	unsigned long *pgaps;
>> -	unsigned char meta_offset;
>> +	unsigned char offset, meta_offset;
>
>Make this two lines.
>
>>  	enum maple_type pmt;
>>  
>> -	pnode = mte_parent(mas->node);
>> -	pmt = mas_parent_type(mas, mas->node);
>> -	penode = mt_mk_node(pnode, pmt);
>> +ascend:
>> +	pnode = mte_parent(enode);
>> +	pmt = mas_parent_type(mas, enode);
>> +	offset = mte_parent_slot(enode);
>>  	pgaps = ma_gaps(pnode, pmt);
>>  
>> -ascend:
>>  	MAS_BUG_ON(mas, pmt != maple_arange_64);
>>  	meta_offset = ma_meta_gap(pnode);
>>  	meta_gap = pgaps[meta_offset];
>>  
>> +	if (pgaps[offset] == new)
>> +		return;
>> +
>
>So every time we go up a level in the tree, we will now check if the
>offset gap is the same as the new gap?  Doesn't this mean every level
>now has an extra check that was previously only done for the first
>level?  This is an unreasonable trade off.

Yes, this seems not very good.

>
>I don't like what is there today, but I don't see this as a meaningful
>improvement and I suspect this extra check is going to cost more than
>the extra hot-cache check that exists today.
>

One thing I found is we may have this case, when we walk up the tree. This
happens when we have a node with two slots have the same max gap.

This means offset != meta_offset, but pgaps[offset] == pgaps[meta_offset].
Even we decrease pgaps[offset], the max gap of this node is not changed. So
we don't need to change gap in parent .

Currently we would first assign the same value then return because gap isn't
change. With this change, we first compare then set if not equal.

But this case is very rare if I am correct.

>>  	pgaps[offset] = new;
>>  
>>  	if (meta_gap == new)
>> @@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
>>  		return;
>>  
>>  	/* Go to the parent node. */
>> -	pnode = mte_parent(penode);
>> -	pmt = mas_parent_type(mas, penode);
>> -	pgaps = ma_gaps(pnode, pmt);
>> -	offset = mte_parent_slot(penode);
>> -	penode = mt_mk_node(pnode, pmt);
>> +	enode = mt_mk_node(pnode, pmt);
>
>Essentially, you have replaced the penode with enode and reversed the
>order of setting things up.  This seems reasonable, as it reduces the
>lines of code.
>
>If you undo this change, you can move the check back outside the loop
>and avoid checking it every iteration and avoid the double check you are
>trying to avoid.
>
>>  	goto ascend;
>>  }
>>  
>> @@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
>>   */
>>  static inline void mas_update_gap(struct ma_state *mas)
>>  {
>> -	unsigned char pslot;
>> -	unsigned long p_gap;
>> -	unsigned long max_gap;
>> -
>>  	if (!mt_is_alloc(mas->tree))
>>  		return;
>>  
>>  	if (mte_is_root(mas->node))
>>  		return;
>>  
>> -	max_gap = mas_max_gap(mas);
>> -
>> -	pslot = mte_parent_slot(mas->node);
>> -	p_gap = ma_gaps(mte_parent(mas->node),
>> -			mas_parent_type(mas, mas->node))[pslot];
>> -
>> -	if (p_gap != max_gap)
>> -		mas_parent_gap(mas, pslot, max_gap);
>> +	mas_parent_gap(mas, mas_max_gap(mas));
>
>This was optimised to skip updating the parent if we don't need to - and
>the parent update was called elsewhere in the past.  Since there's not
>much here, we can probably delete this function and rename the
>mas_parent_gap() with the two tests here for mt_is_alloc() and
>mte_is_root().
>

I guess your suggestion is to merge mas_update_gap() and mas_parent_gap().

Here is the drat, hope I get your idea.


diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index f81e3946b1d5..c04718e6acfa 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1593,28 +1593,34 @@ static inline unsigned long mas_max_gap(struct ma_state *mas)
 }
 
 /*
- * mas_parent_gap() - Set the parent gap and any gaps above, as needed
- * @mas: The maple state
- * @offset: The gap offset in the parent to set
- * @new: The new gap value.
- *
- * Set the parent gap then continue to set the gap upwards, using the metadata
- * of the parent to see if it is necessary to check the node above.
+ * mas_update_gap() - Update a nodes gaps and propagate up if necessary.
+ * @mas: the maple state.
  */
-static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
-		unsigned long new)
+static inline void mas_update_gap(struct ma_state *mas)
 {
 	unsigned long meta_gap = 0;
 	struct maple_node *pnode;
-	struct maple_enode *penode;
+	struct maple_enode *enode;
+	unsigned long new;
 	unsigned long *pgaps;
 	unsigned char meta_offset;
+	unsigned char offset;
 	enum maple_type pmt;
 
+	if (!mt_is_alloc(mas->tree))
+		return;
+
+	if (mte_is_root(mas->node))
+		return;
+
+	new = mas_max_gap(mas);
 	pnode = mte_parent(mas->node);
 	pmt = mas_parent_type(mas, mas->node);
-	penode = mt_mk_node(pnode, pmt);
 	pgaps = ma_gaps(pnode, pmt);
+	offset = mte_parent_slot(mas->node);
+
+	if (pgaps[offset] == new)
+		return;
 
 ascend:
 	MAS_BUG_ON(mas, pmt != maple_arange_64);
@@ -1640,38 +1646,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset,
 		return;
 
 	/* Go to the parent node. */
-	pnode = mte_parent(penode);
-	pmt = mas_parent_type(mas, penode);
+	enode = mt_mk_node(pnode, pmt);
+	pnode = mte_parent(enode);
+	pmt = mas_parent_type(mas, enode);
 	pgaps = ma_gaps(pnode, pmt);
-	offset = mte_parent_slot(penode);
-	penode = mt_mk_node(pnode, pmt);
-	goto ascend;
-}
+	offset = mte_parent_slot(enode);
 
-/*
- * mas_update_gap() - Update a nodes gaps and propagate up if necessary.
- * @mas: the maple state.
- */
-static inline void mas_update_gap(struct ma_state *mas)
-{
-	unsigned char pslot;
-	unsigned long p_gap;
-	unsigned long max_gap;
-
-	if (!mt_is_alloc(mas->tree))
-		return;
-
-	if (mte_is_root(mas->node))
-		return;
-
-	max_gap = mas_max_gap(mas);
-
-	pslot = mte_parent_slot(mas->node);
-	p_gap = ma_gaps(mte_parent(mas->node),
-			mas_parent_type(mas, mas->node))[pslot];
-
-	if (p_gap != max_gap)
-		mas_parent_gap(mas, pslot, max_gap);
+	goto ascend;
 }
 
 /*
Wei Yang Sept. 11, 2024, 11:15 p.m. UTC | #8
On Fri, Sep 06, 2024 at 03:44:10AM +0000, Wei Yang wrote:
>On Thu, Sep 05, 2024 at 04:13:15PM -0400, Liam R. Howlett wrote:
>>* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]:
>>> On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote:
>>> [...]
>>> >>It is only changing code for the sake of changing code.  And it looks
>>> >>like it will be slower, or the same speed if we are lucky.  I have to
>>> >>take time to verify things aren't slower or add subtle issues (maybe an
>>> >>RCU race) because the code looked similar.  It's just not worth it.
>>> >>
>>> >
>>> >I am trying to make the code more easy to read, but seems not helping.
>>> >
>>> >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access
>>> >the first parent, parent's type and its gap twice. Once in mas_update_gap()
>>> >and once in mas_parent_gap().
>>> >
>>> >Do you think it worth a change to reduce one?
>>> >
>>> 
>>> Liam,
>>> 
>>> I am trying to understand what kind code change you don't like. 
>>
>>Any change that cleans things up and verifies the performance would be
>>acceptable, but your previous changes didn't do that so the burden is on
>>me to verify that your code isn't going to cause a regression.
>>
>
>Thanks for your reply. It contains much information which I am trying to
>digest. If I misunderstand, please let me know.
>
>>> 
>>> Is the following change worth?
>>
>>Not like it is right now, but this is worth fixing.
>>
>>Test using the tools/test/radix-tree/maple.c  Look in that file for
>>BENCH defines at the top, and examples of the benchmarking.
>>
>
>I guess I could run them by comment out those #define at the beginning of
>lib/test_maple_tree.c?
>
>I have comment out one of it, what I get is:
>
>    TEST STARTING
>    
>    maple_tree: 80000597 of 80000597 tests passed
>
>My question is how we leverage this test case? From the output itself
>I just see all passed, but I am not sure it breaks the benchmark or not.
>
>>Before you submit anything, run the testing to make sure it all passes.
>>
>
>Yes, I make and run ./maple for each change.
>
>>If you are changing anything for cleanup/optimisation, then write a
>>benchmark that will test the runtime, add that before your change and
>>run it in both before/after with the results.
>>
>
>I guess is to add a #define BENCH_XXX with related function and call it in
>maple_tree_seed(), right?
>
>>Look also into the perf tool to see what is going on and where your time
>>is spent, then you can avoid optimising something that's not worth
>>optimising.
>>
>
>This is use the perf tool on the new added benchmark test?
>
>I am lack of the experience of perf tool. I may need to spend some time to use
>it. Usually we begin with 'perf record ./maple', right?
>

Liam,

If you have some time, would you mind telling me the correct way to use the
benchmark?

So that I can do the correct verification as you expected.
Liam R. Howlett Sept. 13, 2024, 2:13 p.m. UTC | #9
* Wei Yang <richard.weiyang@gmail.com> [240911 19:15]:
...
> 
> Liam,
> 
> If you have some time, would you mind telling me the correct way to use the
> benchmark?
> 
> So that I can do the correct verification as you expected.

I don't have time for that right now, sorry.

I'll try to get back to this when I can.

Thanks,
Liam
diff mbox series

Patch

diff --git a/lib/maple_tree.c b/lib/maple_tree.c
index b7d747a7938e..85668246f944 100644
--- a/lib/maple_tree.c
+++ b/lib/maple_tree.c
@@ -1435,28 +1435,14 @@  static __always_inline unsigned char ma_data_end(struct maple_node *node,
  */
 static inline unsigned char mas_data_end(struct ma_state *mas)
 {
-	enum maple_type type;
-	struct maple_node *node;
-	unsigned char offset;
-	unsigned long *pivots;
-
-	type = mte_node_type(mas->node);
-	node = mas_mn(mas);
-	if (type == maple_arange_64)
-		return ma_meta_end(node, type);
+	enum maple_type type = mte_node_type(mas->node);
+	struct maple_node *node = mas_mn(mas);
+	unsigned long *pivots = ma_pivots(node, type);
 
-	pivots = ma_pivots(node, type);
 	if (unlikely(ma_dead_node(node)))
 		return 0;
 
-	offset = mt_pivots[type] - 1;
-	if (likely(!pivots[offset]))
-		return ma_meta_end(node, type);
-
-	if (likely(pivots[offset] == mas->max))
-		return offset;
-
-	return mt_pivots[type];
+	return ma_data_end(node, type, pivots, mas->max);
 }
 
 /*