Message ID | 20240831001053.4751-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [1/3] maple_tree: use ma_data_end() in mas_data_end() | expand |
* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]: > These two function share almost the same code and do the same thing. Please stop trying to optimise code like this. I don't have time to verify you are not making things worse and you haven't done any of the testing required. I went through with perf to optimise dereferences and static inline functions, so unless you are prepared to look at the generated code and actually benchmark, please stop sending patches that I need to verify like this myself. > > Let's just call ma_data_end() in mas_data_end() to reduce duplicate > code. > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > --- > lib/maple_tree.c | 22 ++++------------------ > 1 file changed, 4 insertions(+), 18 deletions(-) > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index b7d747a7938e..85668246f944 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node, > */ > static inline unsigned char mas_data_end(struct ma_state *mas) > { > - enum maple_type type; > - struct maple_node *node; > - unsigned char offset; > - unsigned long *pivots; > - > - type = mte_node_type(mas->node); > - node = mas_mn(mas); > - if (type == maple_arange_64) > - return ma_meta_end(node, type); > + enum maple_type type = mte_node_type(mas->node); > + struct maple_node *node = mas_mn(mas); > + unsigned long *pivots = ma_pivots(node, type); > > - pivots = ma_pivots(node, type); > if (unlikely(ma_dead_node(node))) > return 0; > > - offset = mt_pivots[type] - 1; > - if (likely(!pivots[offset])) > - return ma_meta_end(node, type); > - > - if (likely(pivots[offset] == mas->max)) > - return offset; > - > - return mt_pivots[type]; > + return ma_data_end(node, type, pivots, mas->max); > } > > /* > -- > 2.34.1 > >
On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]: >> These two function share almost the same code and do the same thing. > >Please stop trying to optimise code like this. I don't have time to >verify you are not making things worse and you haven't done any of the >testing required. > >I went through with perf to optimise dereferences and static inline >functions, so unless you are prepared to look at the generated code and >actually benchmark, please stop sending patches that I need to verify >like this myself. > Would you mind letting me know how could I verify the generated code and benchmark? What you expect to do? For example this one, these two functions share the same code. What's your concern for this? The inline function expansion? >> >> Let's just call ma_data_end() in mas_data_end() to reduce duplicate >> code. >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> --- >> lib/maple_tree.c | 22 ++++------------------ >> 1 file changed, 4 insertions(+), 18 deletions(-) >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index b7d747a7938e..85668246f944 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node, >> */ >> static inline unsigned char mas_data_end(struct ma_state *mas) >> { >> - enum maple_type type; >> - struct maple_node *node; >> - unsigned char offset; >> - unsigned long *pivots; >> - >> - type = mte_node_type(mas->node); >> - node = mas_mn(mas); >> - if (type == maple_arange_64) >> - return ma_meta_end(node, type); >> + enum maple_type type = mte_node_type(mas->node); >> + struct maple_node *node = mas_mn(mas); >> + unsigned long *pivots = ma_pivots(node, type); >> >> - pivots = ma_pivots(node, type); >> if (unlikely(ma_dead_node(node))) >> return 0; >> >> - offset = mt_pivots[type] - 1; >> - if (likely(!pivots[offset])) >> - return ma_meta_end(node, type); >> - >> - if (likely(pivots[offset] == mas->max)) >> - return offset; >> - >> - return mt_pivots[type]; >> + return ma_data_end(node, type, pivots, mas->max); >> } >> >> /* >> -- >> 2.34.1 >> >>
* Wei Yang <richard.weiyang@gmail.com> [240903 20:15]: > On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote: > >* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]: > >> These two function share almost the same code and do the same thing. > > > >Please stop trying to optimise code like this. I don't have time to > >verify you are not making things worse and you haven't done any of the > >testing required. > > > >I went through with perf to optimise dereferences and static inline > >functions, so unless you are prepared to look at the generated code and > >actually benchmark, please stop sending patches that I need to verify > >like this myself. > > > > Would you mind letting me know how could I verify the generated code and > benchmark? What you expect to do? There are a number of benchmarks in the test code that will run specific tasks in iterations to check that there is not a regression. Depending on what you change, you have to run the right one. Usually I write one if one doesn't exist. You can also use perf record and see what happens on specific tests. This function is all over the place, so I really don't know which one to run, and I don't have time to write a test for your changes - and I don't think this is worth it. > > For example this one, these two functions share the same code. Do they? You have removed the fast path for maple_arange_64 using the metadata end before getting the pivots, which means on basically all vma walks we will be adding instructions to the walking of the tree (get the pivots, check if the node is dead, check if the pivots pointer is null). You have also added a check for if (!pivots), which is essentially checking if a dead node was hit - which is already checked in the mas_data_end(). These two checks are the reasons I left both copies of this function in place. I am looking to reduce the execution time, not decrease the line count of the file. So, the last 8 lines of both functions are the same. And please don't submit a patch that adds an inline function that does the last 8 lines to reduce duplication. > What's your > concern for this? The inline function expansion? Your clean up is compressing setting variables to the same line, which is a bad change. It is better to have verbose code where it won't make a difference in what is compiled. And certainly not worth adding after the fact. The inline of the mas_data_end() function depends on the expansion happening by the compiler (which might change based on the version or if it's gcc vs clang), sure that's a bit of a concern. The biggest annoying thing about this whole patch series (without a cover letter) is that it isn't doing anything for fixing, helping, or adding functionality. I'm a big fan of forward progress and this isn't it. It is only changing code for the sake of changing code. And it looks like it will be slower, or the same speed if we are lucky. I have to take time to verify things aren't slower or add subtle issues (maybe an RCU race) because the code looked similar. It's just not worth it. > > >> > >> Let's just call ma_data_end() in mas_data_end() to reduce duplicate > >> code. > >> > >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > >> --- > >> lib/maple_tree.c | 22 ++++------------------ > >> 1 file changed, 4 insertions(+), 18 deletions(-) > >> > >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c > >> index b7d747a7938e..85668246f944 100644 > >> --- a/lib/maple_tree.c > >> +++ b/lib/maple_tree.c > >> @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node, > >> */ > >> static inline unsigned char mas_data_end(struct ma_state *mas) > >> { > >> - enum maple_type type; > >> - struct maple_node *node; > >> - unsigned char offset; > >> - unsigned long *pivots; > >> - > >> - type = mte_node_type(mas->node); > >> - node = mas_mn(mas); > >> - if (type == maple_arange_64) > >> - return ma_meta_end(node, type); > >> + enum maple_type type = mte_node_type(mas->node); > >> + struct maple_node *node = mas_mn(mas); > >> + unsigned long *pivots = ma_pivots(node, type); > >> > >> - pivots = ma_pivots(node, type); > >> if (unlikely(ma_dead_node(node))) > >> return 0; > >> > >> - offset = mt_pivots[type] - 1; > >> - if (likely(!pivots[offset])) > >> - return ma_meta_end(node, type); > >> - > >> - if (likely(pivots[offset] == mas->max)) > >> - return offset; > >> - > >> - return mt_pivots[type]; > >> + return ma_data_end(node, type, pivots, mas->max); > >> } > >> > >> /* > >> -- > >> 2.34.1 > >> > >> > > -- > Wei Yang > Help you, Help me
On Tue, Sep 03, 2024 at 10:25:10PM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [240903 20:15]: >> On Tue, Sep 03, 2024 at 12:12:39PM -0400, Liam R. Howlett wrote: >> >* Wei Yang <richard.weiyang@gmail.com> [240830 20:11]: >> >> These two function share almost the same code and do the same thing. >> > >> >Please stop trying to optimise code like this. I don't have time to >> >verify you are not making things worse and you haven't done any of the >> >testing required. >> > >> >I went through with perf to optimise dereferences and static inline >> >functions, so unless you are prepared to look at the generated code and >> >actually benchmark, please stop sending patches that I need to verify >> >like this myself. >> > >> >> Would you mind letting me know how could I verify the generated code and >> benchmark? What you expect to do? > Liam, Thanks for your patient reply. >There are a number of benchmarks in the test code that will run specific >tasks in iterations to check that there is not a regression. Depending >on what you change, you have to run the right one. Usually I write one >if one doesn't exist. Not familiar with benchmarks. Would you mind naming someone? or the ones you have written? So that I can learn to test or write one later. > >You can also use perf record and see what happens on specific tests. > >This function is all over the place, so I really don't know which one to >run, and I don't have time to write a test for your changes - and I >don't think this is worth it. > >> >> For example this one, these two functions share the same code. > >Do they? > >You have removed the fast path for maple_arange_64 using the metadata >end before getting the pivots, which means on basically all vma walks we >will be adding instructions to the walking of the tree (get the pivots, >check if the node is dead, check if the pivots pointer is null). You >have also added a check for if (!pivots), which is essentially checking >if a dead node was hit - which is already checked in the mas_data_end(). >These two checks are the reasons I left both copies of this function in >place. I am looking to reduce the execution time, not decrease the line >count of the file. > You are right, I should pay more attention to the fast path opt. >So, the last 8 lines of both functions are the same. And please don't >submit a patch that adds an inline function that does the last 8 lines >to reduce duplication. > Ok, I will not try to call an inline function just to reduce some similar code. >> What's your >> concern for this? The inline function expansion? > >Your clean up is compressing setting variables to the same line, which >is a bad change. It is better to have verbose code where it won't make >a difference in what is compiled. And certainly not worth adding after >the fact. > >The inline of the mas_data_end() function depends on the expansion >happening by the compiler (which might change based on the version or if >it's gcc vs clang), sure that's a bit of a concern. > >The biggest annoying thing about this whole patch series (without a >cover letter) is that it isn't doing anything for fixing, helping, or >adding functionality. I'm a big fan of forward progress and this isn't >it. > >It is only changing code for the sake of changing code. And it looks >like it will be slower, or the same speed if we are lucky. I have to >take time to verify things aren't slower or add subtle issues (maybe an >RCU race) because the code looked similar. It's just not worth it. > I am trying to make the code more easy to read, but seems not helping. BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access the first parent, parent's type and its gap twice. Once in mas_update_gap() and once in mas_parent_gap(). Do you think it worth a change to reduce one?
On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote: [...] >>It is only changing code for the sake of changing code. And it looks >>like it will be slower, or the same speed if we are lucky. I have to >>take time to verify things aren't slower or add subtle issues (maybe an >>RCU race) because the code looked similar. It's just not worth it. >> > >I am trying to make the code more easy to read, but seems not helping. > >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access >the first parent, parent's type and its gap twice. Once in mas_update_gap() >and once in mas_parent_gap(). > >Do you think it worth a change to reduce one? > Liam, I am trying to understand what kind code change you don't like. Is the following change worth? diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 2b310dd3addf..e331d086eb7c 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas) /* * mas_parent_gap() - Set the parent gap and any gaps above, as needed * @mas: The maple state - * @offset: The gap offset in the parent to set * @new: The new gap value. * * Set the parent gap then continue to set the gap upwards, using the metadata * of the parent to see if it is necessary to check the node above. */ -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, - unsigned long new) +static inline void mas_parent_gap(struct ma_state *mas, unsigned long new) { unsigned long meta_gap = 0; struct maple_node *pnode; - struct maple_enode *penode; + struct maple_enode *enode = mas->node; unsigned long *pgaps; - unsigned char meta_offset; + unsigned char offset, meta_offset; enum maple_type pmt; - pnode = mte_parent(mas->node); - pmt = mas_parent_type(mas, mas->node); - penode = mt_mk_node(pnode, pmt); +ascend: + pnode = mte_parent(enode); + pmt = mas_parent_type(mas, enode); + offset = mte_parent_slot(enode); pgaps = ma_gaps(pnode, pmt); -ascend: MAS_BUG_ON(mas, pmt != maple_arange_64); meta_offset = ma_meta_gap(pnode); meta_gap = pgaps[meta_offset]; + if (pgaps[offset] == new) + return; + pgaps[offset] = new; if (meta_gap == new) @@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, return; /* Go to the parent node. */ - pnode = mte_parent(penode); - pmt = mas_parent_type(mas, penode); - pgaps = ma_gaps(pnode, pmt); - offset = mte_parent_slot(penode); - penode = mt_mk_node(pnode, pmt); + enode = mt_mk_node(pnode, pmt); goto ascend; } @@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, */ static inline void mas_update_gap(struct ma_state *mas) { - unsigned char pslot; - unsigned long p_gap; - unsigned long max_gap; - if (!mt_is_alloc(mas->tree)) return; if (mte_is_root(mas->node)) return; - max_gap = mas_max_gap(mas); - - pslot = mte_parent_slot(mas->node); - p_gap = ma_gaps(mte_parent(mas->node), - mas_parent_type(mas, mas->node))[pslot]; - - if (p_gap != max_gap) - mas_parent_gap(mas, pslot, max_gap); + mas_parent_gap(mas, mas_max_gap(mas)); } /*
* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]: > On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote: > [...] > >>It is only changing code for the sake of changing code. And it looks > >>like it will be slower, or the same speed if we are lucky. I have to > >>take time to verify things aren't slower or add subtle issues (maybe an > >>RCU race) because the code looked similar. It's just not worth it. > >> > > > >I am trying to make the code more easy to read, but seems not helping. > > > >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access > >the first parent, parent's type and its gap twice. Once in mas_update_gap() > >and once in mas_parent_gap(). > > > >Do you think it worth a change to reduce one? > > > > Liam, > > I am trying to understand what kind code change you don't like. Any change that cleans things up and verifies the performance would be acceptable, but your previous changes didn't do that so the burden is on me to verify that your code isn't going to cause a regression. > > Is the following change worth? Not like it is right now, but this is worth fixing. Test using the tools/test/radix-tree/maple.c Look in that file for BENCH defines at the top, and examples of the benchmarking. Before you submit anything, run the testing to make sure it all passes. If you are changing anything for cleanup/optimisation, then write a benchmark that will test the runtime, add that before your change and run it in both before/after with the results. Look also into the perf tool to see what is going on and where your time is spent, then you can avoid optimising something that's not worth optimising. > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index 2b310dd3addf..e331d086eb7c 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas) > /* > * mas_parent_gap() - Set the parent gap and any gaps above, as needed > * @mas: The maple state > - * @offset: The gap offset in the parent to set > * @new: The new gap value. > * > * Set the parent gap then continue to set the gap upwards, using the metadata > * of the parent to see if it is necessary to check the node above. > */ > -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, > - unsigned long new) > +static inline void mas_parent_gap(struct ma_state *mas, unsigned long new) > { > unsigned long meta_gap = 0; > struct maple_node *pnode; > - struct maple_enode *penode; > + struct maple_enode *enode = mas->node; Set this with the rest of the configuration, before the ascend label would make more sense and be more clear. > unsigned long *pgaps; > - unsigned char meta_offset; > + unsigned char offset, meta_offset; Make this two lines. > enum maple_type pmt; > > - pnode = mte_parent(mas->node); > - pmt = mas_parent_type(mas, mas->node); > - penode = mt_mk_node(pnode, pmt); > +ascend: > + pnode = mte_parent(enode); > + pmt = mas_parent_type(mas, enode); > + offset = mte_parent_slot(enode); > pgaps = ma_gaps(pnode, pmt); > > -ascend: > MAS_BUG_ON(mas, pmt != maple_arange_64); > meta_offset = ma_meta_gap(pnode); > meta_gap = pgaps[meta_offset]; > > + if (pgaps[offset] == new) > + return; > + So every time we go up a level in the tree, we will now check if the offset gap is the same as the new gap? Doesn't this mean every level now has an extra check that was previously only done for the first level? This is an unreasonable trade off. I don't like what is there today, but I don't see this as a meaningful improvement and I suspect this extra check is going to cost more than the extra hot-cache check that exists today. > pgaps[offset] = new; > > if (meta_gap == new) > @@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, > return; > > /* Go to the parent node. */ > - pnode = mte_parent(penode); > - pmt = mas_parent_type(mas, penode); > - pgaps = ma_gaps(pnode, pmt); > - offset = mte_parent_slot(penode); > - penode = mt_mk_node(pnode, pmt); > + enode = mt_mk_node(pnode, pmt); Essentially, you have replaced the penode with enode and reversed the order of setting things up. This seems reasonable, as it reduces the lines of code. If you undo this change, you can move the check back outside the loop and avoid checking it every iteration and avoid the double check you are trying to avoid. > goto ascend; > } > > @@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, > */ > static inline void mas_update_gap(struct ma_state *mas) > { > - unsigned char pslot; > - unsigned long p_gap; > - unsigned long max_gap; > - > if (!mt_is_alloc(mas->tree)) > return; > > if (mte_is_root(mas->node)) > return; > > - max_gap = mas_max_gap(mas); > - > - pslot = mte_parent_slot(mas->node); > - p_gap = ma_gaps(mte_parent(mas->node), > - mas_parent_type(mas, mas->node))[pslot]; > - > - if (p_gap != max_gap) > - mas_parent_gap(mas, pslot, max_gap); > + mas_parent_gap(mas, mas_max_gap(mas)); This was optimised to skip updating the parent if we don't need to - and the parent update was called elsewhere in the past. Since there's not much here, we can probably delete this function and rename the mas_parent_gap() with the two tests here for mt_is_alloc() and mte_is_root(). ... Thanks, Liam
On Thu, Sep 05, 2024 at 04:13:15PM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]: >> On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote: >> [...] >> >>It is only changing code for the sake of changing code. And it looks >> >>like it will be slower, or the same speed if we are lucky. I have to >> >>take time to verify things aren't slower or add subtle issues (maybe an >> >>RCU race) because the code looked similar. It's just not worth it. >> >> >> > >> >I am trying to make the code more easy to read, but seems not helping. >> > >> >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access >> >the first parent, parent's type and its gap twice. Once in mas_update_gap() >> >and once in mas_parent_gap(). >> > >> >Do you think it worth a change to reduce one? >> > >> >> Liam, >> >> I am trying to understand what kind code change you don't like. > >Any change that cleans things up and verifies the performance would be >acceptable, but your previous changes didn't do that so the burden is on >me to verify that your code isn't going to cause a regression. > Thanks for your reply. It contains much information which I am trying to digest. If I misunderstand, please let me know. >> >> Is the following change worth? > >Not like it is right now, but this is worth fixing. > >Test using the tools/test/radix-tree/maple.c Look in that file for >BENCH defines at the top, and examples of the benchmarking. > I guess I could run them by comment out those #define at the beginning of lib/test_maple_tree.c? I have comment out one of it, what I get is: TEST STARTING maple_tree: 80000597 of 80000597 tests passed My question is how we leverage this test case? From the output itself I just see all passed, but I am not sure it breaks the benchmark or not. >Before you submit anything, run the testing to make sure it all passes. > Yes, I make and run ./maple for each change. >If you are changing anything for cleanup/optimisation, then write a >benchmark that will test the runtime, add that before your change and >run it in both before/after with the results. > I guess is to add a #define BENCH_XXX with related function and call it in maple_tree_seed(), right? >Look also into the perf tool to see what is going on and where your time >is spent, then you can avoid optimising something that's not worth >optimising. > This is use the perf tool on the new added benchmark test? I am lack of the experience of perf tool. I may need to spend some time to use it. Usually we begin with 'perf record ./maple', right? >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index 2b310dd3addf..e331d086eb7c 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -1595,32 +1595,33 @@ static inline unsigned long mas_max_gap(struct ma_state *mas) >> /* >> * mas_parent_gap() - Set the parent gap and any gaps above, as needed >> * @mas: The maple state >> - * @offset: The gap offset in the parent to set >> * @new: The new gap value. >> * >> * Set the parent gap then continue to set the gap upwards, using the metadata >> * of the parent to see if it is necessary to check the node above. >> */ >> -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, >> - unsigned long new) >> +static inline void mas_parent_gap(struct ma_state *mas, unsigned long new) >> { >> unsigned long meta_gap = 0; >> struct maple_node *pnode; >> - struct maple_enode *penode; >> + struct maple_enode *enode = mas->node; > >Set this with the rest of the configuration, before the ascend label >would make more sense and be more clear. > >> unsigned long *pgaps; >> - unsigned char meta_offset; >> + unsigned char offset, meta_offset; > >Make this two lines. > >> enum maple_type pmt; >> >> - pnode = mte_parent(mas->node); >> - pmt = mas_parent_type(mas, mas->node); >> - penode = mt_mk_node(pnode, pmt); >> +ascend: >> + pnode = mte_parent(enode); >> + pmt = mas_parent_type(mas, enode); >> + offset = mte_parent_slot(enode); >> pgaps = ma_gaps(pnode, pmt); >> >> -ascend: >> MAS_BUG_ON(mas, pmt != maple_arange_64); >> meta_offset = ma_meta_gap(pnode); >> meta_gap = pgaps[meta_offset]; >> >> + if (pgaps[offset] == new) >> + return; >> + > >So every time we go up a level in the tree, we will now check if the >offset gap is the same as the new gap? Doesn't this mean every level >now has an extra check that was previously only done for the first >level? This is an unreasonable trade off. Yes, this seems not very good. > >I don't like what is there today, but I don't see this as a meaningful >improvement and I suspect this extra check is going to cost more than >the extra hot-cache check that exists today. > One thing I found is we may have this case, when we walk up the tree. This happens when we have a node with two slots have the same max gap. This means offset != meta_offset, but pgaps[offset] == pgaps[meta_offset]. Even we decrease pgaps[offset], the max gap of this node is not changed. So we don't need to change gap in parent . Currently we would first assign the same value then return because gap isn't change. With this change, we first compare then set if not equal. But this case is very rare if I am correct. >> pgaps[offset] = new; >> >> if (meta_gap == new) >> @@ -1640,11 +1641,7 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, >> return; >> >> /* Go to the parent node. */ >> - pnode = mte_parent(penode); >> - pmt = mas_parent_type(mas, penode); >> - pgaps = ma_gaps(pnode, pmt); >> - offset = mte_parent_slot(penode); >> - penode = mt_mk_node(pnode, pmt); >> + enode = mt_mk_node(pnode, pmt); > >Essentially, you have replaced the penode with enode and reversed the >order of setting things up. This seems reasonable, as it reduces the >lines of code. > >If you undo this change, you can move the check back outside the loop >and avoid checking it every iteration and avoid the double check you are >trying to avoid. > >> goto ascend; >> } >> >> @@ -1654,24 +1651,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, >> */ >> static inline void mas_update_gap(struct ma_state *mas) >> { >> - unsigned char pslot; >> - unsigned long p_gap; >> - unsigned long max_gap; >> - >> if (!mt_is_alloc(mas->tree)) >> return; >> >> if (mte_is_root(mas->node)) >> return; >> >> - max_gap = mas_max_gap(mas); >> - >> - pslot = mte_parent_slot(mas->node); >> - p_gap = ma_gaps(mte_parent(mas->node), >> - mas_parent_type(mas, mas->node))[pslot]; >> - >> - if (p_gap != max_gap) >> - mas_parent_gap(mas, pslot, max_gap); >> + mas_parent_gap(mas, mas_max_gap(mas)); > >This was optimised to skip updating the parent if we don't need to - and >the parent update was called elsewhere in the past. Since there's not >much here, we can probably delete this function and rename the >mas_parent_gap() with the two tests here for mt_is_alloc() and >mte_is_root(). > I guess your suggestion is to merge mas_update_gap() and mas_parent_gap(). Here is the drat, hope I get your idea. diff --git a/lib/maple_tree.c b/lib/maple_tree.c index f81e3946b1d5..c04718e6acfa 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1593,28 +1593,34 @@ static inline unsigned long mas_max_gap(struct ma_state *mas) } /* - * mas_parent_gap() - Set the parent gap and any gaps above, as needed - * @mas: The maple state - * @offset: The gap offset in the parent to set - * @new: The new gap value. - * - * Set the parent gap then continue to set the gap upwards, using the metadata - * of the parent to see if it is necessary to check the node above. + * mas_update_gap() - Update a nodes gaps and propagate up if necessary. + * @mas: the maple state. */ -static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, - unsigned long new) +static inline void mas_update_gap(struct ma_state *mas) { unsigned long meta_gap = 0; struct maple_node *pnode; - struct maple_enode *penode; + struct maple_enode *enode; + unsigned long new; unsigned long *pgaps; unsigned char meta_offset; + unsigned char offset; enum maple_type pmt; + if (!mt_is_alloc(mas->tree)) + return; + + if (mte_is_root(mas->node)) + return; + + new = mas_max_gap(mas); pnode = mte_parent(mas->node); pmt = mas_parent_type(mas, mas->node); - penode = mt_mk_node(pnode, pmt); pgaps = ma_gaps(pnode, pmt); + offset = mte_parent_slot(mas->node); + + if (pgaps[offset] == new) + return; ascend: MAS_BUG_ON(mas, pmt != maple_arange_64); @@ -1640,38 +1646,13 @@ static inline void mas_parent_gap(struct ma_state *mas, unsigned char offset, return; /* Go to the parent node. */ - pnode = mte_parent(penode); - pmt = mas_parent_type(mas, penode); + enode = mt_mk_node(pnode, pmt); + pnode = mte_parent(enode); + pmt = mas_parent_type(mas, enode); pgaps = ma_gaps(pnode, pmt); - offset = mte_parent_slot(penode); - penode = mt_mk_node(pnode, pmt); - goto ascend; -} + offset = mte_parent_slot(enode); -/* - * mas_update_gap() - Update a nodes gaps and propagate up if necessary. - * @mas: the maple state. - */ -static inline void mas_update_gap(struct ma_state *mas) -{ - unsigned char pslot; - unsigned long p_gap; - unsigned long max_gap; - - if (!mt_is_alloc(mas->tree)) - return; - - if (mte_is_root(mas->node)) - return; - - max_gap = mas_max_gap(mas); - - pslot = mte_parent_slot(mas->node); - p_gap = ma_gaps(mte_parent(mas->node), - mas_parent_type(mas, mas->node))[pslot]; - - if (p_gap != max_gap) - mas_parent_gap(mas, pslot, max_gap); + goto ascend; } /*
On Fri, Sep 06, 2024 at 03:44:10AM +0000, Wei Yang wrote: >On Thu, Sep 05, 2024 at 04:13:15PM -0400, Liam R. Howlett wrote: >>* Wei Yang <richard.weiyang@gmail.com> [240904 10:53]: >>> On Wed, Sep 04, 2024 at 07:58:19AM +0000, Wei Yang wrote: >>> [...] >>> >>It is only changing code for the sake of changing code. And it looks >>> >>like it will be slower, or the same speed if we are lucky. I have to >>> >>take time to verify things aren't slower or add subtle issues (maybe an >>> >>RCU race) because the code looked similar. It's just not worth it. >>> >> >>> > >>> >I am trying to make the code more easy to read, but seems not helping. >>> > >>> >BTW, I found in mas_update_gap(), if (p_gap != max_gap), we would access >>> >the first parent, parent's type and its gap twice. Once in mas_update_gap() >>> >and once in mas_parent_gap(). >>> > >>> >Do you think it worth a change to reduce one? >>> > >>> >>> Liam, >>> >>> I am trying to understand what kind code change you don't like. >> >>Any change that cleans things up and verifies the performance would be >>acceptable, but your previous changes didn't do that so the burden is on >>me to verify that your code isn't going to cause a regression. >> > >Thanks for your reply. It contains much information which I am trying to >digest. If I misunderstand, please let me know. > >>> >>> Is the following change worth? >> >>Not like it is right now, but this is worth fixing. >> >>Test using the tools/test/radix-tree/maple.c Look in that file for >>BENCH defines at the top, and examples of the benchmarking. >> > >I guess I could run them by comment out those #define at the beginning of >lib/test_maple_tree.c? > >I have comment out one of it, what I get is: > > TEST STARTING > > maple_tree: 80000597 of 80000597 tests passed > >My question is how we leverage this test case? From the output itself >I just see all passed, but I am not sure it breaks the benchmark or not. > >>Before you submit anything, run the testing to make sure it all passes. >> > >Yes, I make and run ./maple for each change. > >>If you are changing anything for cleanup/optimisation, then write a >>benchmark that will test the runtime, add that before your change and >>run it in both before/after with the results. >> > >I guess is to add a #define BENCH_XXX with related function and call it in >maple_tree_seed(), right? > >>Look also into the perf tool to see what is going on and where your time >>is spent, then you can avoid optimising something that's not worth >>optimising. >> > >This is use the perf tool on the new added benchmark test? > >I am lack of the experience of perf tool. I may need to spend some time to use >it. Usually we begin with 'perf record ./maple', right? > Liam, If you have some time, would you mind telling me the correct way to use the benchmark? So that I can do the correct verification as you expected.
* Wei Yang <richard.weiyang@gmail.com> [240911 19:15]: ... > > Liam, > > If you have some time, would you mind telling me the correct way to use the > benchmark? > > So that I can do the correct verification as you expected. I don't have time for that right now, sorry. I'll try to get back to this when I can. Thanks, Liam
On Fri, Sep 13, 2024 at 10:13:24AM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [240911 19:15]: >... >> >> Liam, >> >> If you have some time, would you mind telling me the correct way to use the >> benchmark? >> >> So that I can do the correct verification as you expected. > >I don't have time for that right now, sorry. > >I'll try to get back to this when I can. > Got it, thanks. >Thanks, >Liam
diff --git a/lib/maple_tree.c b/lib/maple_tree.c index b7d747a7938e..85668246f944 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1435,28 +1435,14 @@ static __always_inline unsigned char ma_data_end(struct maple_node *node, */ static inline unsigned char mas_data_end(struct ma_state *mas) { - enum maple_type type; - struct maple_node *node; - unsigned char offset; - unsigned long *pivots; - - type = mte_node_type(mas->node); - node = mas_mn(mas); - if (type == maple_arange_64) - return ma_meta_end(node, type); + enum maple_type type = mte_node_type(mas->node); + struct maple_node *node = mas_mn(mas); + unsigned long *pivots = ma_pivots(node, type); - pivots = ma_pivots(node, type); if (unlikely(ma_dead_node(node))) return 0; - offset = mt_pivots[type] - 1; - if (likely(!pivots[offset])) - return ma_meta_end(node, type); - - if (likely(pivots[offset] == mas->max)) - return offset; - - return mt_pivots[type]; + return ma_data_end(node, type, pivots, mas->max); } /*
These two function share almost the same code and do the same thing. Let's just call ma_data_end() in mas_data_end() to reduce duplicate code. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- lib/maple_tree.c | 22 ++++------------------ 1 file changed, 4 insertions(+), 18 deletions(-)