Message ID | 9d4c1f034038df2ae232b6665a0d9d7ee5833c5f.1715336798.git.ps@pks.im (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | reftable: expose write options as config | expand |
Patrick Steinhardt <ps@pks.im> writes: > When auto-compacting, the reftable library packs references such that > the sizes of the tables form a geometric sequence. The factor for this > geometric sequence is hardcoded to 2 right now. We're about to expose > this as a config option though, so let's expose the factor via write > options. Hmph. It is unclear if having this as uint8_t gives us a useful enhancement, but perhaps in the future hosters may find a more aggressive geometric sequence is better for their workload or something and raise it to 3 or 4? I was actually wondering if a base smaller than 2 (e.g. fibonacci) may work better. Anyway, making it configurable is a good first step. Allowing a bit finer grained setting than just integral values can be done later if it proves necessary.
On Fri, May 10, 2024 at 03:12:03PM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > When auto-compacting, the reftable library packs references such that > > the sizes of the tables form a geometric sequence. The factor for this > > geometric sequence is hardcoded to 2 right now. We're about to expose > > this as a config option though, so let's expose the factor via write > > options. > > Hmph. It is unclear if having this as uint8_t gives us a useful > enhancement, but perhaps in the future hosters may find a more > aggressive geometric sequence is better for their workload or > something and raise it to 3 or 4? I was actually wondering if a > base smaller than 2 (e.g. fibonacci) may work better. > > Anyway, making it configurable is a good first step. Allowing a bit > finer grained setting than just integral values can be done later if > it proves necessary. That's a fair point indeed. I had similar issues with git-repack(1)'s `--geometric=` option, where you can also only pick integers. There's also a similar discussion in the patch series by Taylor [1], where I proposed to maybe introduce floats into git-config(1). So this may be good enough for now, and when we gain the ability to parse floats we may convert this to accept floats, as well. An alternative would be to convert this to percent, where the default value would be 200. That should give sufficient flexibility without having to introduce floats. Patrick [1]: https://lore.kernel.org/git/Zjk2UIV3kEwZUDW+@nand.local/
Patrick Steinhardt <ps@pks.im> writes: > So this may be good enough for now, and when we gain the ability to > parse floats we may convert this to accept floats, as well. An > alternative would be to convert this to percent, where the default value > would be 200. That should give sufficient flexibility without having to > introduce floats. There already is an established way to specify float with arbitrary precision on the command line if we limit the value to 0..1, by the way. https://lore.kernel.org/git/Pine.LNX.4.58.0505191516350.2322@ppc970.osdl.org/ It is amusing to revisit an ancient discussion thread. I can see, on the same thread in the discussion following that message, that back in May 2005, we were discussing "intent to add" that did not get invented until mid 2008 already ;-).
On Mon, May 13, 2024 at 09:22:38AM -0700, Junio C Hamano wrote: > Patrick Steinhardt <ps@pks.im> writes: > > > So this may be good enough for now, and when we gain the ability to > > parse floats we may convert this to accept floats, as well. An > > alternative would be to convert this to percent, where the default value > > would be 200. That should give sufficient flexibility without having to > > introduce floats. > > There already is an established way to specify float with arbitrary > precision on the command line if we limit the value to 0..1, by the > way. > > https://lore.kernel.org/git/Pine.LNX.4.58.0505191516350.2322@ppc970.osdl.org/ The problem is that we don't want to limit the value to 0..1, but to 1..n. 1 really is the lowest semi-sensible number you can pick and means that all tables should have the same size. And in practice, there is no upper limit, even though it's probably not all that reasonable to pick anything beyond 10. In any case, I'd propose to keep this as-is for now. The simple reason is that we have preexisting commands (`git repack -g`) and in-flight series (pseudo-merge bitmaps) that also use plain integers to represent the geometric factor. So I'm aiming for consistency. From thereon we can then iterate and design a solution that works for all of them, e.g. by allowing for floats outside of the 0..1 range in whatever form. > It is amusing to revisit an ancient discussion thread. I can see, > on the same thread in the discussion following that message, that > back in May 2005, we were discussing "intent to add" that did not > get invented until mid 2008 already ;-). :) Patrick
diff --git a/reftable/constants.h b/reftable/constants.h index 5eee72c4c1..f6beb843eb 100644 --- a/reftable/constants.h +++ b/reftable/constants.h @@ -17,5 +17,6 @@ license that can be found in the LICENSE file or at #define MAX_RESTARTS ((1 << 16) - 1) #define DEFAULT_BLOCK_SIZE 4096 +#define DEFAULT_GEOMETRIC_FACTOR 2 #endif diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h index 4cd8ebe6c7..155457b042 100644 --- a/reftable/reftable-writer.h +++ b/reftable/reftable-writer.h @@ -49,6 +49,12 @@ struct reftable_write_options { /* boolean: Prevent auto-compaction of tables. */ unsigned disable_auto_compact : 1; + + /* + * Geometric sequence factor used by auto-compaction to decide which + * tables to compact. Defaults to 2 if unset. + */ + uint8_t auto_compaction_factor; }; /* reftable_block_stats holds statistics for a single block type */ diff --git a/reftable/stack.c b/reftable/stack.c index 7b4fff7c9e..762954b181 100644 --- a/reftable/stack.c +++ b/reftable/stack.c @@ -10,6 +10,7 @@ license that can be found in the LICENSE file or at #include "../write-or-die.h" #include "system.h" +#include "constants.h" #include "merged.h" #include "reader.h" #include "refname.h" @@ -1215,12 +1216,16 @@ static int segment_size(struct segment *s) return s->end - s->start; } -struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) +struct segment suggest_compaction_segment(uint64_t *sizes, size_t n, + uint8_t factor) { struct segment seg = { 0 }; uint64_t bytes; size_t i; + if (!factor) + factor = DEFAULT_GEOMETRIC_FACTOR; + /* * If there are no tables or only a single one then we don't have to * compact anything. The sequence is geometric by definition already. @@ -1252,7 +1257,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) * 64, 32, 16, 8, 4, 3, 1 */ for (i = n - 1; i > 0; i--) { - if (sizes[i - 1] < sizes[i] * 2) { + if (sizes[i - 1] < sizes[i] * factor) { seg.end = i + 1; bytes = sizes[i]; break; @@ -1278,7 +1283,7 @@ struct segment suggest_compaction_segment(uint64_t *sizes, size_t n) uint64_t curr = bytes; bytes += sizes[i - 1]; - if (sizes[i - 1] < curr * 2) { + if (sizes[i - 1] < curr * factor) { seg.start = i - 1; seg.bytes = bytes; } @@ -1304,7 +1309,8 @@ int reftable_stack_auto_compact(struct reftable_stack *st) { uint64_t *sizes = stack_table_sizes_for_compaction(st); struct segment seg = - suggest_compaction_segment(sizes, st->merged->stack_len); + suggest_compaction_segment(sizes, st->merged->stack_len, + st->opts.auto_compaction_factor); reftable_free(sizes); if (segment_size(&seg) > 0) return stack_compact_range_stats(st, seg.start, seg.end - 1, diff --git a/reftable/stack.h b/reftable/stack.h index 97d7ebc043..5b45cff4f7 100644 --- a/reftable/stack.h +++ b/reftable/stack.h @@ -35,6 +35,7 @@ struct segment { uint64_t bytes; }; -struct segment suggest_compaction_segment(uint64_t *sizes, size_t n); +struct segment suggest_compaction_segment(uint64_t *sizes, size_t n, + uint8_t factor); #endif diff --git a/reftable/stack_test.c b/reftable/stack_test.c index 3316d55f19..f6c11ef18d 100644 --- a/reftable/stack_test.c +++ b/reftable/stack_test.c @@ -767,7 +767,7 @@ static void test_suggest_compaction_segment(void) { uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 }; struct segment min = - suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); + suggest_compaction_segment(sizes, ARRAY_SIZE(sizes), 2); EXPECT(min.start == 1); EXPECT(min.end == 10); } @@ -776,7 +776,7 @@ static void test_suggest_compaction_segment_nothing(void) { uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 }; struct segment result = - suggest_compaction_segment(sizes, ARRAY_SIZE(sizes)); + suggest_compaction_segment(sizes, ARRAY_SIZE(sizes), 2); EXPECT(result.start == result.end); }
When auto-compacting, the reftable library packs references such that the sizes of the tables form a geometric sequence. The factor for this geometric sequence is hardcoded to 2 right now. We're about to expose this as a config option though, so let's expose the factor via write options. Signed-off-by: Patrick Steinhardt <ps@pks.im> --- reftable/constants.h | 1 + reftable/reftable-writer.h | 6 ++++++ reftable/stack.c | 14 ++++++++++---- reftable/stack.h | 3 ++- reftable/stack_test.c | 4 ++-- 5 files changed, 21 insertions(+), 7 deletions(-)