diff mbox series

[v2,10/11] reftable: make the compaction factor configurable

Message ID 9d4c1f034038df2ae232b6665a0d9d7ee5833c5f.1715336798.git.ps@pks.im (mailing list archive)
State Superseded
Headers show
Series reftable: expose write options as config | expand

Commit Message

Patrick Steinhardt May 10, 2024, 10:30 a.m. UTC
When auto-compacting, the reftable library packs references such that
the sizes of the tables form a geometric sequence. The factor for this
geometric sequence is hardcoded to 2 right now. We're about to expose
this as a config option though, so let's expose the factor via write
options.

Signed-off-by: Patrick Steinhardt <ps@pks.im>
---
 reftable/constants.h       |  1 +
 reftable/reftable-writer.h |  6 ++++++
 reftable/stack.c           | 14 ++++++++++----
 reftable/stack.h           |  3 ++-
 reftable/stack_test.c      |  4 ++--
 5 files changed, 21 insertions(+), 7 deletions(-)

Comments

Junio C Hamano May 10, 2024, 10:12 p.m. UTC | #1
Patrick Steinhardt <ps@pks.im> writes:

> When auto-compacting, the reftable library packs references such that
> the sizes of the tables form a geometric sequence. The factor for this
> geometric sequence is hardcoded to 2 right now. We're about to expose
> this as a config option though, so let's expose the factor via write
> options.

Hmph.  It is unclear if having this as uint8_t gives us a useful
enhancement, but perhaps in the future hosters may find a more
aggressive geometric sequence is better for their workload or
something and raise it to 3 or 4?  I was actually wondering if a
base smaller than 2 (e.g. fibonacci) may work better.

Anyway, making it configurable is a good first step.  Allowing a bit
finer grained setting than just integral values can be done later if
it proves necessary.
Patrick Steinhardt May 13, 2024, 7:54 a.m. UTC | #2
On Fri, May 10, 2024 at 03:12:03PM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > When auto-compacting, the reftable library packs references such that
> > the sizes of the tables form a geometric sequence. The factor for this
> > geometric sequence is hardcoded to 2 right now. We're about to expose
> > this as a config option though, so let's expose the factor via write
> > options.
> 
> Hmph.  It is unclear if having this as uint8_t gives us a useful
> enhancement, but perhaps in the future hosters may find a more
> aggressive geometric sequence is better for their workload or
> something and raise it to 3 or 4?  I was actually wondering if a
> base smaller than 2 (e.g. fibonacci) may work better.
> 
> Anyway, making it configurable is a good first step.  Allowing a bit
> finer grained setting than just integral values can be done later if
> it proves necessary.

That's a fair point indeed. I had similar issues with git-repack(1)'s
`--geometric=` option, where you can also only pick integers. There's
also a similar discussion in the patch series by Taylor [1], where I
proposed to maybe introduce floats into git-config(1).

So this may be good enough for now, and when we gain the ability to
parse floats we may convert this to accept floats, as well. An
alternative would be to convert this to percent, where the default value
would be 200. That should give sufficient flexibility without having to
introduce floats.

Patrick

[1]: https://lore.kernel.org/git/Zjk2UIV3kEwZUDW+@nand.local/
Junio C Hamano May 13, 2024, 4:22 p.m. UTC | #3
Patrick Steinhardt <ps@pks.im> writes:

> So this may be good enough for now, and when we gain the ability to
> parse floats we may convert this to accept floats, as well. An
> alternative would be to convert this to percent, where the default value
> would be 200. That should give sufficient flexibility without having to
> introduce floats.

There already is an established way to specify float with arbitrary
precision on the command line if we limit the value to 0..1, by the
way.

https://lore.kernel.org/git/Pine.LNX.4.58.0505191516350.2322@ppc970.osdl.org/

It is amusing to revisit an ancient discussion thread.  I can see,
on the same thread in the discussion following that message, that
back in May 2005, we were discussing "intent to add" that did not
get invented until mid 2008 already ;-).
Patrick Steinhardt May 14, 2024, 4:54 a.m. UTC | #4
On Mon, May 13, 2024 at 09:22:38AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <ps@pks.im> writes:
> 
> > So this may be good enough for now, and when we gain the ability to
> > parse floats we may convert this to accept floats, as well. An
> > alternative would be to convert this to percent, where the default value
> > would be 200. That should give sufficient flexibility without having to
> > introduce floats.
> 
> There already is an established way to specify float with arbitrary
> precision on the command line if we limit the value to 0..1, by the
> way.
> 
> https://lore.kernel.org/git/Pine.LNX.4.58.0505191516350.2322@ppc970.osdl.org/

The problem is that we don't want to limit the value to 0..1, but to
1..n. 1 really is the lowest semi-sensible number you can pick and means
that all tables should have the same size. And in practice, there is no
upper limit, even though it's probably not all that reasonable to pick
anything beyond 10.

In any case, I'd propose to keep this as-is for now. The simple reason
is that we have preexisting commands (`git repack -g`) and in-flight
series (pseudo-merge bitmaps) that also use plain integers to represent
the geometric factor. So I'm aiming for consistency. From thereon we can
then iterate and design a solution that works for all of them, e.g. by
allowing for floats outside of the 0..1 range in whatever form.

> It is amusing to revisit an ancient discussion thread.  I can see,
> on the same thread in the discussion following that message, that
> back in May 2005, we were discussing "intent to add" that did not
> get invented until mid 2008 already ;-).

:)

Patrick
diff mbox series

Patch

diff --git a/reftable/constants.h b/reftable/constants.h
index 5eee72c4c1..f6beb843eb 100644
--- a/reftable/constants.h
+++ b/reftable/constants.h
@@ -17,5 +17,6 @@  license that can be found in the LICENSE file or at
 
 #define MAX_RESTARTS ((1 << 16) - 1)
 #define DEFAULT_BLOCK_SIZE 4096
+#define DEFAULT_GEOMETRIC_FACTOR 2
 
 #endif
diff --git a/reftable/reftable-writer.h b/reftable/reftable-writer.h
index 4cd8ebe6c7..155457b042 100644
--- a/reftable/reftable-writer.h
+++ b/reftable/reftable-writer.h
@@ -49,6 +49,12 @@  struct reftable_write_options {
 
 	/* boolean: Prevent auto-compaction of tables. */
 	unsigned disable_auto_compact : 1;
+
+	/*
+	 * Geometric sequence factor used by auto-compaction to decide which
+	 * tables to compact. Defaults to 2 if unset.
+	 */
+	uint8_t auto_compaction_factor;
 };
 
 /* reftable_block_stats holds statistics for a single block type */
diff --git a/reftable/stack.c b/reftable/stack.c
index 7b4fff7c9e..762954b181 100644
--- a/reftable/stack.c
+++ b/reftable/stack.c
@@ -10,6 +10,7 @@  license that can be found in the LICENSE file or at
 
 #include "../write-or-die.h"
 #include "system.h"
+#include "constants.h"
 #include "merged.h"
 #include "reader.h"
 #include "refname.h"
@@ -1215,12 +1216,16 @@  static int segment_size(struct segment *s)
 	return s->end - s->start;
 }
 
-struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
+struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
+					  uint8_t factor)
 {
 	struct segment seg = { 0 };
 	uint64_t bytes;
 	size_t i;
 
+	if (!factor)
+		factor = DEFAULT_GEOMETRIC_FACTOR;
+
 	/*
 	 * If there are no tables or only a single one then we don't have to
 	 * compact anything. The sequence is geometric by definition already.
@@ -1252,7 +1257,7 @@  struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
 	 * 	64, 32, 16, 8, 4, 3, 1
 	 */
 	for (i = n - 1; i > 0; i--) {
-		if (sizes[i - 1] < sizes[i] * 2) {
+		if (sizes[i - 1] < sizes[i] * factor) {
 			seg.end = i + 1;
 			bytes = sizes[i];
 			break;
@@ -1278,7 +1283,7 @@  struct segment suggest_compaction_segment(uint64_t *sizes, size_t n)
 		uint64_t curr = bytes;
 		bytes += sizes[i - 1];
 
-		if (sizes[i - 1] < curr * 2) {
+		if (sizes[i - 1] < curr * factor) {
 			seg.start = i - 1;
 			seg.bytes = bytes;
 		}
@@ -1304,7 +1309,8 @@  int reftable_stack_auto_compact(struct reftable_stack *st)
 {
 	uint64_t *sizes = stack_table_sizes_for_compaction(st);
 	struct segment seg =
-		suggest_compaction_segment(sizes, st->merged->stack_len);
+		suggest_compaction_segment(sizes, st->merged->stack_len,
+					   st->opts.auto_compaction_factor);
 	reftable_free(sizes);
 	if (segment_size(&seg) > 0)
 		return stack_compact_range_stats(st, seg.start, seg.end - 1,
diff --git a/reftable/stack.h b/reftable/stack.h
index 97d7ebc043..5b45cff4f7 100644
--- a/reftable/stack.h
+++ b/reftable/stack.h
@@ -35,6 +35,7 @@  struct segment {
 	uint64_t bytes;
 };
 
-struct segment suggest_compaction_segment(uint64_t *sizes, size_t n);
+struct segment suggest_compaction_segment(uint64_t *sizes, size_t n,
+					  uint8_t factor);
 
 #endif
diff --git a/reftable/stack_test.c b/reftable/stack_test.c
index 3316d55f19..f6c11ef18d 100644
--- a/reftable/stack_test.c
+++ b/reftable/stack_test.c
@@ -767,7 +767,7 @@  static void test_suggest_compaction_segment(void)
 {
 	uint64_t sizes[] = { 512, 64, 17, 16, 9, 9, 9, 16, 2, 16 };
 	struct segment min =
-		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes), 2);
 	EXPECT(min.start == 1);
 	EXPECT(min.end == 10);
 }
@@ -776,7 +776,7 @@  static void test_suggest_compaction_segment_nothing(void)
 {
 	uint64_t sizes[] = { 64, 32, 16, 8, 4, 2 };
 	struct segment result =
-		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes));
+		suggest_compaction_segment(sizes, ARRAY_SIZE(sizes), 2);
 	EXPECT(result.start == result.end);
 }