Message ID | 8564f98259727225391edcb5ab3b47dd53f00e48.1740680964.git.me@ttaylorr.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | pack-objects: freshen objects with multi-cruft packs | expand |
On Thu, Feb 27, 2025 at 10:29 AM Taylor Blau <me@ttaylorr.com> wrote: > > In 37dc6d8104 (builtin/repack.c: implement support for > `--max-cruft-size`, 2023-10-02), 'git repack' built on support for > multiple cruft packs in Git by instructing 'git pack-objects --cruft' > how to aggregate smaller cruft packs up to the provided threshold. > > The implementation in 37dc6d8104 worked something like the following > pseudo-code: > > total_size = 0; > > for (p in cruft packs) { > if (p->pack_size + total_size < max_size) { > total_size += p->pack_size; > collapse(p) > } else { > retain(p); > } > } > > The original idea behind this approach was that smaller cruft packs > would get combined together until the sum of their sizes was no larger > than the given max pack size. > > There is a much simpler way to achieve this, however, which is to simply > combine *all* cruft packs which are smaller than the threshold, > regardless of what their sum is. With '--max-pack-size', 'pack-objects' > will split out the resulting pack into individual pack(s) if necessary > to ensure that the written pack(s) are each no larger than the provided > threshold. That doesn't really "achieve this" though, unless the antecedent of "this" isn't what was described in the previous paragraph but something elsewhere. I suspect your actual meaning was something along the lines of "There is a much simpler way to combine cruft packs, however, which..." ? > This yields a slight behavior change, which is reflected in the removed > test. Previous to this change, we would aggregate smaller cruft packs > first, whereas now we will opportunistically combine as many cruft packs > as possible. As as result, that test is no longer relevant, and can be > deleted. I like the idea, since it sounds like it should be simpler... > Signed-off-by: Taylor Blau <me@ttaylorr.com> > --- > builtin/repack.c | 38 ++----------------------------------- > t/t7704-repack-cruft.sh | 42 ----------------------------------------- > 2 files changed, 2 insertions(+), 78 deletions(-) > > diff --git a/builtin/repack.c b/builtin/repack.c > index 75e3752353a..4d83d40f39f 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -1022,29 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args, > return finish_pack_objects_cmd(&cmd, names, local); > } > > -static int existing_cruft_pack_cmp(const void *va, const void *vb) > -{ > - struct packed_git *a = *(struct packed_git **)va; > - struct packed_git *b = *(struct packed_git **)vb; > - > - if (a->pack_size < b->pack_size) > - return -1; > - if (a->pack_size > b->pack_size) > - return 1; > - return 0; > -} > - > static void collapse_small_cruft_packs(FILE *in, size_t max_size, > struct existing_packs *existing) > { > - struct packed_git **existing_cruft, *p; > + struct packed_git *p; > struct strbuf buf = STRBUF_INIT; > - size_t total_size = 0; > - size_t existing_cruft_nr = 0; > size_t i; > > - ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr); > - > for (p = get_all_packs(the_repository); p; p = p->next) { > if (!(p->is_cruft && p->pack_local)) > continue; > @@ -1056,24 +1040,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, > if (!string_list_has_string(&existing->cruft_packs, buf.buf)) > continue; > > - if (existing_cruft_nr >= existing->cruft_packs.nr) > - BUG("too many cruft packs (found %"PRIuMAX", but knew " > - "of %"PRIuMAX")", > - (uintmax_t)existing_cruft_nr + 1, > - (uintmax_t)existing->cruft_packs.nr); > - existing_cruft[existing_cruft_nr++] = p; > - } > - > - QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp); > - > - for (i = 0; i < existing_cruft_nr; i++) { > - size_t proposed; > - > - p = existing_cruft[i]; > - proposed = st_add(total_size, p->pack_size); > - > - if (proposed <= max_size) { > - total_size = proposed; > + if (p->pack_size < max_size) { Look at all that deleted code. Always nice to see a simplification in action. :-) > fprintf(in, "-%s\n", pack_basename(p)); > } else { > retain_cruft_pack(existing, p); > @@ -1086,7 +1053,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, > existing->non_kept_packs.items[i].string); > > strbuf_release(&buf); > - free(existing_cruft); > } > > static int write_cruft_pack(const struct pack_objects_args *args, > diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh > index 959e6e26488..5a76b541ddd 100755 > --- a/t/t7704-repack-cruft.sh > +++ b/t/t7704-repack-cruft.sh > @@ -194,48 +194,6 @@ test_expect_success '--max-cruft-size combines existing packs when below thresho > ) > ' > > -test_expect_success '--max-cruft-size combines smaller packs first' ' > - git init max-cruft-size-consume-small && > - ( > - cd max-cruft-size-consume-small && > - > - test_commit base && > - git repack -ad && > - > - cruft_foo="$(generate_cruft_pack foo 524288)" && # 0.5 MiB > - cruft_bar="$(generate_cruft_pack bar 524288)" && # 0.5 MiB > - cruft_baz="$(generate_cruft_pack baz 1048576)" && # 1.0 MiB > - cruft_quux="$(generate_cruft_pack quux 1572864)" && # 1.5 MiB > - > - test-tool pack-mtimes "$(basename $cruft_foo)" >expect.raw && > - test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw && > - sort expect.raw >expect.objects && > - > - # repacking with `--max-cruft-size=2M` should combine > - # both 0.5 MiB packs together, instead of, say, one of > - # the 0.5 MiB packs with the 1.0 MiB pack > - ls $packdir/pack-*.mtimes | sort >cruft.before && > - git repack -d --cruft --max-cruft-size=2M && > - ls $packdir/pack-*.mtimes | sort >cruft.after && > - > - comm -13 cruft.before cruft.after >cruft.new && > - comm -23 cruft.before cruft.after >cruft.removed && > - > - test_line_count = 1 cruft.new && > - test_line_count = 2 cruft.removed && > - > - # the two smaller packs should be rolled up first > - printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed && > - test_cmp expect.removed cruft.removed && > - > - # ...and contain the set of objects rolled up > - test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw && > - sort actual.raw >actual.objects && > - > - test_cmp expect.objects actual.objects > - ) > -' > - > test_expect_success 'setup --max-cruft-size with freshened objects' ' > git init max-cruft-size-freshen && > ( > -- > 2.49.0.rc0.2.gc0c926adde2 Looks good to me, other than the misleading/ambiguous wording in that one paragraph of the commit message.
On Thu, Feb 27, 2025 at 11:23:02AM -0800, Elijah Newren wrote: > > The original idea behind this approach was that smaller cruft packs > > would get combined together until the sum of their sizes was no larger > > than the given max pack size. > > > > There is a much simpler way to achieve this, however, which is to simply > > combine *all* cruft packs which are smaller than the threshold, > > regardless of what their sum is. With '--max-pack-size', 'pack-objects' > > will split out the resulting pack into individual pack(s) if necessary > > to ensure that the written pack(s) are each no larger than the provided > > threshold. > > That doesn't really "achieve this" though, unless the antecedent of > "this" isn't what was described in the previous paragraph but > something elsewhere. I suspect your actual meaning was something > along the lines of "There is a much simpler way to combine cruft > packs, however, which..." ? Great suggestion, thanks. I swapped out "achieve this" for your recommendation. > > This yields a slight behavior change, which is reflected in the removed > > test. Previous to this change, we would aggregate smaller cruft packs > > first, whereas now we will opportunistically combine as many cruft packs > > as possible. As as result, that test is no longer relevant, and can be > > deleted. > > I like the idea, since it sounds like it should be simpler... Heh. I don't know why I wrote it the way it was originally. I wrote the second patch in this series first, and when I was trying to explain how multi-cruft pack aggregation works I paused and then wrote what is now the first patch. Hindsight is 20/20, I suppose ;-). Thanks, Taylor
On Thu, Feb 27, 2025 at 01:29:28PM -0500, Taylor Blau wrote: > In 37dc6d8104 (builtin/repack.c: implement support for > `--max-cruft-size`, 2023-10-02), 'git repack' built on support for > multiple cruft packs in Git by instructing 'git pack-objects --cruft' > how to aggregate smaller cruft packs up to the provided threshold. > > The implementation in 37dc6d8104 worked something like the following > pseudo-code: > > total_size = 0; > > for (p in cruft packs) { > if (p->pack_size + total_size < max_size) { > total_size += p->pack_size; > collapse(p) > } else { > retain(p); > } > } > > The original idea behind this approach was that smaller cruft packs > would get combined together until the sum of their sizes was no larger > than the given max pack size. > > There is a much simpler way to achieve this, however, which is to simply > combine *all* cruft packs which are smaller than the threshold, > regardless of what their sum is. With '--max-pack-size', 'pack-objects' > will split out the resulting pack into individual pack(s) if necessary > to ensure that the written pack(s) are each no larger than the provided > threshold. Hm. So the result would be a new set of packfiles where each of them is smaller than the threshold, right? Wouldn't that mean that the next time we'll again do the same thing and try to combine the new set of cruft packs into one, and basically never arrive at a state where we don't touch the cruft packs anymore? Patrick
diff --git a/builtin/repack.c b/builtin/repack.c index 75e3752353a..4d83d40f39f 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -1022,29 +1022,13 @@ static int write_filtered_pack(const struct pack_objects_args *args, return finish_pack_objects_cmd(&cmd, names, local); } -static int existing_cruft_pack_cmp(const void *va, const void *vb) -{ - struct packed_git *a = *(struct packed_git **)va; - struct packed_git *b = *(struct packed_git **)vb; - - if (a->pack_size < b->pack_size) - return -1; - if (a->pack_size > b->pack_size) - return 1; - return 0; -} - static void collapse_small_cruft_packs(FILE *in, size_t max_size, struct existing_packs *existing) { - struct packed_git **existing_cruft, *p; + struct packed_git *p; struct strbuf buf = STRBUF_INIT; - size_t total_size = 0; - size_t existing_cruft_nr = 0; size_t i; - ALLOC_ARRAY(existing_cruft, existing->cruft_packs.nr); - for (p = get_all_packs(the_repository); p; p = p->next) { if (!(p->is_cruft && p->pack_local)) continue; @@ -1056,24 +1040,7 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, if (!string_list_has_string(&existing->cruft_packs, buf.buf)) continue; - if (existing_cruft_nr >= existing->cruft_packs.nr) - BUG("too many cruft packs (found %"PRIuMAX", but knew " - "of %"PRIuMAX")", - (uintmax_t)existing_cruft_nr + 1, - (uintmax_t)existing->cruft_packs.nr); - existing_cruft[existing_cruft_nr++] = p; - } - - QSORT(existing_cruft, existing_cruft_nr, existing_cruft_pack_cmp); - - for (i = 0; i < existing_cruft_nr; i++) { - size_t proposed; - - p = existing_cruft[i]; - proposed = st_add(total_size, p->pack_size); - - if (proposed <= max_size) { - total_size = proposed; + if (p->pack_size < max_size) { fprintf(in, "-%s\n", pack_basename(p)); } else { retain_cruft_pack(existing, p); @@ -1086,7 +1053,6 @@ static void collapse_small_cruft_packs(FILE *in, size_t max_size, existing->non_kept_packs.items[i].string); strbuf_release(&buf); - free(existing_cruft); } static int write_cruft_pack(const struct pack_objects_args *args, diff --git a/t/t7704-repack-cruft.sh b/t/t7704-repack-cruft.sh index 959e6e26488..5a76b541ddd 100755 --- a/t/t7704-repack-cruft.sh +++ b/t/t7704-repack-cruft.sh @@ -194,48 +194,6 @@ test_expect_success '--max-cruft-size combines existing packs when below thresho ) ' -test_expect_success '--max-cruft-size combines smaller packs first' ' - git init max-cruft-size-consume-small && - ( - cd max-cruft-size-consume-small && - - test_commit base && - git repack -ad && - - cruft_foo="$(generate_cruft_pack foo 524288)" && # 0.5 MiB - cruft_bar="$(generate_cruft_pack bar 524288)" && # 0.5 MiB - cruft_baz="$(generate_cruft_pack baz 1048576)" && # 1.0 MiB - cruft_quux="$(generate_cruft_pack quux 1572864)" && # 1.5 MiB - - test-tool pack-mtimes "$(basename $cruft_foo)" >expect.raw && - test-tool pack-mtimes "$(basename $cruft_bar)" >>expect.raw && - sort expect.raw >expect.objects && - - # repacking with `--max-cruft-size=2M` should combine - # both 0.5 MiB packs together, instead of, say, one of - # the 0.5 MiB packs with the 1.0 MiB pack - ls $packdir/pack-*.mtimes | sort >cruft.before && - git repack -d --cruft --max-cruft-size=2M && - ls $packdir/pack-*.mtimes | sort >cruft.after && - - comm -13 cruft.before cruft.after >cruft.new && - comm -23 cruft.before cruft.after >cruft.removed && - - test_line_count = 1 cruft.new && - test_line_count = 2 cruft.removed && - - # the two smaller packs should be rolled up first - printf "%s\n" $cruft_foo $cruft_bar | sort >expect.removed && - test_cmp expect.removed cruft.removed && - - # ...and contain the set of objects rolled up - test-tool pack-mtimes "$(basename $(cat cruft.new))" >actual.raw && - sort actual.raw >actual.objects && - - test_cmp expect.objects actual.objects - ) -' - test_expect_success 'setup --max-cruft-size with freshened objects' ' git init max-cruft-size-freshen && (
In 37dc6d8104 (builtin/repack.c: implement support for `--max-cruft-size`, 2023-10-02), 'git repack' built on support for multiple cruft packs in Git by instructing 'git pack-objects --cruft' how to aggregate smaller cruft packs up to the provided threshold. The implementation in 37dc6d8104 worked something like the following pseudo-code: total_size = 0; for (p in cruft packs) { if (p->pack_size + total_size < max_size) { total_size += p->pack_size; collapse(p) } else { retain(p); } } The original idea behind this approach was that smaller cruft packs would get combined together until the sum of their sizes was no larger than the given max pack size. There is a much simpler way to achieve this, however, which is to simply combine *all* cruft packs which are smaller than the threshold, regardless of what their sum is. With '--max-pack-size', 'pack-objects' will split out the resulting pack into individual pack(s) if necessary to ensure that the written pack(s) are each no larger than the provided threshold. This yields a slight behavior change, which is reflected in the removed test. Previous to this change, we would aggregate smaller cruft packs first, whereas now we will opportunistically combine as many cruft packs as possible. As as result, that test is no longer relevant, and can be deleted. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- builtin/repack.c | 38 ++----------------------------------- t/t7704-repack-cruft.sh | 42 ----------------------------------------- 2 files changed, 2 insertions(+), 78 deletions(-)