Message ID | 20241020024628.22469-2-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | maple_tree: current split may result in deficient node | expand |
* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]: > Current split would result in deficient node in rare case. > > For example: > > mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE) > for (count = 0; count < 10; count++) { > mas_set(&mas, count); > mas_store(&mas, xa_mk_value(count)); > } > > for (count = 20; count < 39; count++) { > mas_set(&mas, count); > mas_store(&mas, xa_mk_value(count)); > } > > for (count = 10; count < 12; count++) { > mas_set(&mas, count); > mas_store(&mas, xa_mk_value(count)); > } > mt_validate(mt); > > The validation would report deficient node. I think you can explain it better than this? If we fill a left node with the right node being already full, the split of the left node will result in the new middle node being insufficient. > > The reason is we don't leave enough room for the right node. The reason is we don't leave enough data for the node on the right of the split. The node on the right has too much room from what I see? > > The deficient check is (end < mt_min_slot[]), which means a node must > have at least (mt_min_slot[] + 1) number of data. The right node's index > range is [split + 1, b_end], which means the number of data in right > node is (b_end - (split + 1) + 1) = (b_end - split). > > Then the check between the number of data and (mt_min_slot[] + 1) is > (b_end - split) > (mt_min_slot[] + 1), which leads to > (b_end - split - 1) > (mt_min_slot[]). Don't state the code, it's stated below. I am still concerned about jitter that this patch set may cause. > > Signed-off-by: Wei Yang <richard.weiyang@gmail.com> > CC: Liam R. Howlett <Liam.Howlett@Oracle.com> > CC: Sidhartha Kumar <sidhartha.kumar@oracle.com> > CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> > --- > lib/maple_tree.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/lib/maple_tree.c b/lib/maple_tree.c > index 1205a5208cfe..894dc5e9436e 100644 > --- a/lib/maple_tree.c > +++ b/lib/maple_tree.c > @@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node, > * still be sufficient, then increment the split on NULL. > */ > if ((split < slot_count - 1) && > - (b_node->b_end - split) > (mt_min_slots[b_node->type])) > + (b_node->b_end - split - 1) > (mt_min_slots[b_node->type])) > split++; > else > split--; > @@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas, > */ > while ((split < slot_count - 1) && > ((bn->pivot[split] - min) < slot_count - 1) && > - (b_end - split > slot_min)) > + (b_end - split - 1 > slot_min)) > split++; > } > > -- > 2.34.1 >
On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]: >> Current split would result in deficient node in rare case. >> >> For example: >> >> mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE) >> for (count = 0; count < 10; count++) { >> mas_set(&mas, count); >> mas_store(&mas, xa_mk_value(count)); >> } >> >> for (count = 20; count < 39; count++) { >> mas_set(&mas, count); >> mas_store(&mas, xa_mk_value(count)); >> } >> >> for (count = 10; count < 12; count++) { >> mas_set(&mas, count); >> mas_store(&mas, xa_mk_value(count)); >> } >> mt_validate(mt); >> >> The validation would report deficient node. > >I think you can explain it better than this? > >If we fill a left node with the right node being already full, the split >of the left node will result in the new middle node being insufficient. > Thanks, I will put the explanation later. >> >> The reason is we don't leave enough room for the right node. > >The reason is we don't leave enough data for the node on the right of >the split. The node on the right has too much room from what I see? > Too much room? Or too much empty room? I may missed here. >> >> The deficient check is (end < mt_min_slot[]), which means a node must >> have at least (mt_min_slot[] + 1) number of data. The right node's index >> range is [split + 1, b_end], which means the number of data in right >> node is (b_end - (split + 1) + 1) = (b_end - split). >> >> Then the check between the number of data and (mt_min_slot[] + 1) is >> (b_end - split) > (mt_min_slot[] + 1), which leads to >> (b_end - split - 1) > (mt_min_slot[]). > >Don't state the code, it's stated below. Sure, will remote this part. > >I am still concerned about jitter that this patch set may cause. Per my understanding, a jitter is a removal which cause a deficient node which is just created from a split, right? > >> >> Signed-off-by: Wei Yang <richard.weiyang@gmail.com> >> CC: Liam R. Howlett <Liam.Howlett@Oracle.com> >> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com> >> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> >> --- >> lib/maple_tree.c | 4 ++-- >> 1 file changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/lib/maple_tree.c b/lib/maple_tree.c >> index 1205a5208cfe..894dc5e9436e 100644 >> --- a/lib/maple_tree.c >> +++ b/lib/maple_tree.c >> @@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node, >> * still be sufficient, then increment the split on NULL. >> */ >> if ((split < slot_count - 1) && >> - (b_node->b_end - split) > (mt_min_slots[b_node->type])) >> + (b_node->b_end - split - 1) > (mt_min_slots[b_node->type])) >> split++; >> else >> split--; >> @@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas, >> */ >> while ((split < slot_count - 1) && >> ((bn->pivot[split] - min) < slot_count - 1) && >> - (b_end - split > slot_min)) >> + (b_end - split - 1 > slot_min)) >> split++; >> } >> >> -- >> 2.34.1 >>
On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote: >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]: [...] > >Don't state the code, it's stated below. > >I am still concerned about jitter that this patch set may cause. > If my understanding is correct, the change here won't cause worse jitter. Below is the dump result when this case happens. Let's name those nodes: n1, n2, n3. maple_tree(0x5611079e1680) flags 9, height 2 root 0x615000001c1e 0-18446744073709551615: node 0x615000001c00 depth 0 type 3 parent 0x5611079e1681 contents: 0 8 18446744073709551577 0 0 0 0 0 0 0 | 02 02| 0x61500000210c 10 0x615000001f0c 23 0x61500000120c 18446744073709551615 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0-10: node 0x615000002100 depth 1 type 1 parent 0x615000001c06 contents: 0x1 0 0x3 1 0x5 2 0x7 3 0x9 4 0xb 5 0xd 6 0xf 7 0x11 8 0x13 9 0x15 10 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0xa 0: value 0 (0x0) [0x1] 1: value 1 (0x1) [0x3] 2: value 2 (0x2) [0x5] 3: value 3 (0x3) [0x7] 4: value 4 (0x4) [0x9] 5: value 5 (0x5) [0xb] 6: value 6 (0x6) [0xd] 7: value 7 (0x7) [0xf] 8: value 8 (0x8) [0x11] 9: value 9 (0x9) [0x13] 10: value 10 (0xa) [0x15] 11-23: node 0x615000001f00 depth 1 type 1 parent 0x615000001c0e contents: 0x17 11 (nil) 19 0x29 20 0x2b 21 0x2d 22 0x2f 23 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0x5 11: value 11 (0xb) [0x17] 12-19: (nil) 20: value 20 (0x14) [0x29] 21: value 21 (0x15) [0x2b] 22: value 22 (0x16) [0x2d] 23: value 23 (0x17) [0x2f] 24-18446744073709551615: node 0x615000001200 depth 1 type 1 parent 0x615000001c16 contents: 0x31 24 0x33 25 0x35 26 0x37 27 0x39 28 0x3b 29 0x3d 30 0x3f 31 0x41 32 0x43 33 0x45 34 0x47 35 0x49 36 0x4b 37 0x4d 38 (nil) 24: value 24 (0x18) [0x31] 25: value 25 (0x19) [0x33] 26: value 26 (0x1a) [0x35] 27: value 27 (0x1b) [0x37] 28: value 28 (0x1c) [0x39] 29: value 29 (0x1d) [0x3b] 30: value 30 (0x1e) [0x3d] 31: value 31 (0x1f) [0x3f] 32: value 32 (0x20) [0x41] 33: value 33 (0x21) [0x43] 34: value 34 (0x22) [0x45] 35: value 35 (0x23) [0x47] 36: value 36 (0x24) [0x49] 37: value 37 (0x25) [0x4b] 38: value 38 (0x26) [0x4d] 39-18446744073709551615: (nil) Since n2 is already deficient, removal a data from n2 would need rebalance, which is a jitter if my understanding is correct. After this change, removal a data from n2 would also result in a deficient node. So this is not worse than current behavior. Do you have other cases in concern?
* Wei Yang <richard.weiyang@gmail.com> [241107 21:31]: > On Sun, Oct 20, 2024 at 05:55:10PM -0400, Liam R. Howlett wrote: > >* Wei Yang <richard.weiyang@gmail.com> [241019 22:46]: > [...] > > > >Don't state the code, it's stated below. > > > >I am still concerned about jitter that this patch set may cause. > > > > If my understanding is correct, the change here won't cause worse jitter. > > Below is the dump result when this case happens. > Let's name those nodes: n1, n2, n3. > > maple_tree(0x5611079e1680) flags 9, height 2 root 0x615000001c1e > 0-18446744073709551615: node 0x615000001c00 depth 0 type 3 parent 0x5611079e1681 contents: 0 8 18446744073709551577 0 0 0 0 0 0 0 | 02 02| 0x61500000210c 10 0x615000001f0c 23 0x61500000120c 18446744073709551615 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) > 0-10: node 0x615000002100 depth 1 type 1 parent 0x615000001c06 contents: 0x1 0 0x3 1 0x5 2 0x7 3 0x9 4 0xb 5 0xd 6 0xf 7 0x11 8 0x13 9 0x15 10 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0xa > 0: value 0 (0x0) [0x1] > 1: value 1 (0x1) [0x3] > 2: value 2 (0x2) [0x5] > 3: value 3 (0x3) [0x7] > 4: value 4 (0x4) [0x9] > 5: value 5 (0x5) [0xb] > 6: value 6 (0x6) [0xd] > 7: value 7 (0x7) [0xf] > 8: value 8 (0x8) [0x11] > 9: value 9 (0x9) [0x13] > 10: value 10 (0xa) [0x15] > 11-23: node 0x615000001f00 depth 1 type 1 parent 0x615000001c0e contents: 0x17 11 (nil) 19 0x29 20 0x2b 21 0x2d 22 0x2f 23 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 (nil) 0 0x5 > 11: value 11 (0xb) [0x17] > 12-19: (nil) > 20: value 20 (0x14) [0x29] > 21: value 21 (0x15) [0x2b] > 22: value 22 (0x16) [0x2d] > 23: value 23 (0x17) [0x2f] > 24-18446744073709551615: node 0x615000001200 depth 1 type 1 parent 0x615000001c16 contents: 0x31 24 0x33 25 0x35 26 0x37 27 0x39 28 0x3b 29 0x3d 30 0x3f 31 0x41 32 0x43 33 0x45 34 0x47 35 0x49 36 0x4b 37 0x4d 38 (nil) > 24: value 24 (0x18) [0x31] > 25: value 25 (0x19) [0x33] > 26: value 26 (0x1a) [0x35] > 27: value 27 (0x1b) [0x37] > 28: value 28 (0x1c) [0x39] > 29: value 29 (0x1d) [0x3b] > 30: value 30 (0x1e) [0x3d] > 31: value 31 (0x1f) [0x3f] > 32: value 32 (0x20) [0x41] > 33: value 33 (0x21) [0x43] > 34: value 34 (0x22) [0x45] > 35: value 35 (0x23) [0x47] > 36: value 36 (0x24) [0x49] > 37: value 37 (0x25) [0x4b] > 38: value 38 (0x26) [0x4d] > 39-18446744073709551615: (nil) > > Since n2 is already deficient, removal a data from n2 would need rebalance, > which is a jitter if my understanding is correct. > > After this change, removal a data from n2 would also result in a deficient > node. So this is not worse than current behavior. > > Do you have other cases in concern? My concern comes when we have a node with 16 entries, then add one then delete one, then add one, then delete one, then add one, then delete one... I am not worried that you fixed the insufficient node, I am worried that the splitting and rebalancing target will become too strict and cause us to allocate nodes over and over - this can be even worse if it causes a tree expansion/contraction further up. There are a few benchmarks that do this more than it happens in the real world, but it is still important to handle this case for real world cases. Thanks, Liam
diff --git a/lib/maple_tree.c b/lib/maple_tree.c index 1205a5208cfe..894dc5e9436e 100644 --- a/lib/maple_tree.c +++ b/lib/maple_tree.c @@ -1831,7 +1831,7 @@ static inline int mab_no_null_split(struct maple_big_node *b_node, * still be sufficient, then increment the split on NULL. */ if ((split < slot_count - 1) && - (b_node->b_end - split) > (mt_min_slots[b_node->type])) + (b_node->b_end - split - 1) > (mt_min_slots[b_node->type])) split++; else split--; @@ -1896,7 +1896,7 @@ static inline int mab_calc_split(struct ma_state *mas, */ while ((split < slot_count - 1) && ((bn->pivot[split] - min) < slot_count - 1) && - (b_end - split > slot_min)) + (b_end - split - 1 > slot_min)) split++; }
Current split would result in deficient node in rare case. For example: mt_init_flags(mt, MT_FLAGS_ALLOC_RANGE) for (count = 0; count < 10; count++) { mas_set(&mas, count); mas_store(&mas, xa_mk_value(count)); } for (count = 20; count < 39; count++) { mas_set(&mas, count); mas_store(&mas, xa_mk_value(count)); } for (count = 10; count < 12; count++) { mas_set(&mas, count); mas_store(&mas, xa_mk_value(count)); } mt_validate(mt); The validation would report deficient node. The reason is we don't leave enough room for the right node. The deficient check is (end < mt_min_slot[]), which means a node must have at least (mt_min_slot[] + 1) number of data. The right node's index range is [split + 1, b_end], which means the number of data in right node is (b_end - (split + 1) + 1) = (b_end - split). Then the check between the number of data and (mt_min_slot[] + 1) is (b_end - split) > (mt_min_slot[] + 1), which leads to (b_end - split - 1) > (mt_min_slot[]). Signed-off-by: Wei Yang <richard.weiyang@gmail.com> CC: Liam R. Howlett <Liam.Howlett@Oracle.com> CC: Sidhartha Kumar <sidhartha.kumar@oracle.com> CC: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> --- lib/maple_tree.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-)