Message ID | 20190314091254.nescpfp3n6mbjpmh@dcvr (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] repack: enable bitmaps by default on bare repos | expand |
On Thu, Mar 14, 2019 at 09:12:54AM +0000, Eric Wong wrote: > > The reason it defaults to off is for on-disk compatibility with JGit. > > Right. Our documentation seems to indicate JGit just warns (but > doesn't fall over), so maybe that can be considered separately. I think it was a hard error in the beginning, but they changed it pretty soon after we added more flags. So it might be reasonable to just enable it by default (but it wouldn't hurt to double check the behavior). I tried running t5310 (which does a back-and-forth with jgit) using this patch: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a154fc29f6..5264bf055a 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -97,7 +97,7 @@ static off_t reuse_packfile_offset; static int use_bitmap_index_default = 1; static int use_bitmap_index = -1; static int write_bitmap_index; -static uint16_t write_bitmap_options; +static uint16_t write_bitmap_options = BITMAP_OPT_HASH_CACHE; static int exclude_promisor_objects; and it seemed happy. > As far as serving clones and fetches, public-inbox-init has > always created bare repos with bitmaps enabled, but without > the hash-cache for compatibility concerns. > > That's a lot of fetches and clones over the years. The symptom you'd see is that "Compressing objects" during a fetch takes a long time, and/or produces lousy deltas. But it matters less if: - you keep things packed pretty promptly, because we don't bother looking for new deltas between objects in the same pack. Just trying to clone public-inbox.org/git, it does look like it's mostly packed (based on the object counts) but the compression phase still takes 10+ seconds. - how much the names actually help. In your case, I'd think not at all, because being based on hashes, they're effectively random. So the pack-objects heuristics to try to find deltas between files of similar filenames will not help you. Regarding the second thing, I wondered if the overall packing of your public-inbox git repo might not be good, so I did a "git repack -adf --window=1000" on a clone. Hundreds of CPU minutes later, I was only able to shave off about 80MB. I'm not sure if that means you occasionally do very aggressive repacks, or if there simply isn't all that much delta opportunity (after all, you're not storing many versions of one file, but rather tons of different emails; I would expect to find deltas between various versions of a patch, though). Anyway... > ---------8<----------- > Subject: [PATCH] repack: enable bitmaps by default on bare repos > > A typical use case for bare repos is for serving clones and > fetches to clients. Enable bitmaps by default on bare repos to > make it easier for admins to host git repos in a performant way. > > Signed-off-by: Eric Wong <e@80x24.org> > Helped-by: Jeff King <peff@peff.net> This version looks good to me. If we're going to flip the hash-cache default, I think that should be a separate patch anyway. -Peff
On Thu, Mar 14 2019, Eric Wong wrote: > Jeff King <peff@peff.net> wrote: >> On Wed, Mar 13, 2019 at 01:51:33AM +0000, Eric Wong wrote: >> >> > But I did find Ævar's forgotten gitperformance doc and thread >> > where the topic was brought up: >> > >> > https://public-inbox.org/git/20170403211644.26814-1-avarab@gmail.com/ >> >> One thing that thread reminded me of: we probably also want to default >> pack.writebitmaphashcache on. Otherwise the time saved during the object >> enumeration can backfire when we spend much more time trying delta >> compression (because we don't know the pathnames of any objects). > > Interesting... I found a big improvement with public-inbox > just using bitmaps; but have never tried the hash cache. > >> The reason it defaults to off is for on-disk compatibility with JGit. > > Right. Our documentation seems to indicate JGit just warns (but > doesn't fall over), so maybe that can be considered separately. > > I've never used JGit myself; and was satisfied enough with > bitmaps alone that I never tried the hash-cache. > >> But I have very little experience running without the hash-cache on. We >> added it very early on because we found performance was not great >> without it (I don't know if people running JGit have run into the same >> problem and if not, why not). > > As far as serving clones and fetches, public-inbox-init has > always created bare repos with bitmaps enabled, but without > the hash-cache for compatibility concerns. > > That's a lot of fetches and clones over the years. > >> > +test_expect_success 'incremental repack does not complain' ' >> > + git -C bare.git repack -q 2>repack.err && >> > + ! test -s repack.err >> > +' >> >> This last line could use "test_must_be_empty". > > Thanks for the review! > > ---------8<----------- > Subject: [PATCH] repack: enable bitmaps by default on bare repos > > A typical use case for bare repos is for serving clones and > fetches to clients. Enable bitmaps by default on bare repos to > make it easier for admins to host git repos in a performant way. > > Signed-off-by: Eric Wong <e@80x24.org> > Helped-by: Jeff King <peff@peff.net> > --- > Documentation/config/repack.txt | 2 +- > builtin/repack.c | 5 ++++- > t/t7700-repack.sh | 19 ++++++++++++++++++- > 3 files changed, 23 insertions(+), 3 deletions(-) > > diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt > index a5c37813fd..9c413e177e 100644 > --- a/Documentation/config/repack.txt > +++ b/Documentation/config/repack.txt > @@ -24,4 +24,4 @@ repack.writeBitmaps:: > packs created for clones and fetches, at the cost of some disk > space and extra time spent on the initial repack. This has > no effect if multiple packfiles are created. > - Defaults to false. > + Defaults to true on bare repos, false otherwise. > diff --git a/builtin/repack.c b/builtin/repack.c > index 67f8978043..caca113927 100644 > --- a/builtin/repack.c > +++ b/builtin/repack.c > @@ -14,7 +14,7 @@ > > static int delta_base_offset = 1; > static int pack_kept_objects = -1; > -static int write_bitmaps; > +static int write_bitmaps = -1; > static int use_delta_islands; > static char *packdir, *packtmp; > > @@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) > (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE))) > die(_("--keep-unreachable and -A are incompatible")); > > + if (write_bitmaps < 0) > + write_bitmaps = (pack_everything & ALL_INTO_ONE) && > + is_bare_repository(); > if (pack_kept_objects < 0) > pack_kept_objects = write_bitmaps; > > diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh > index 6162e2a8e6..86d05160a3 100755 > --- a/t/t7700-repack.sh > +++ b/t/t7700-repack.sh > @@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' ' > ) > ' > > -test_done > +test_expect_success 'bitmaps are created by default in bare repos' ' > + git clone --bare .git bare.git && > + git -C bare.git repack -ad && > + bitmap=$(ls bare.git/objects/pack/*.bitmap) && > + test_path_is_file "$bitmap" > +' > + > +test_expect_success 'incremental repack does not complain' ' > + git -C bare.git repack -q 2>repack.err && > + test_must_be_empty repack.err > +' > > +test_expect_success 'bitmaps can be disabled on bare repos' ' > + git -c repack.writeBitmaps=false -C bare.git repack -ad && > + bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) && > + test -z "$bitmap" > +' > + > +test_done I've found a case where turning bitmaps on does horrible things for bitmap "push" performance. As it turns out it's not related to this patch per-se, because I had a *.bitmap for other reasons, but replying to this because we'd presumably get the same thing in the bare repo case once this merges down. I can't share the repo, but I had a report where just a "git push" of a topic branch that was 2/58 ahead/behind took ~2 minutes just in "Enumerating objects", but ~500ms without bitmaps. Using a horrible "print to stderr"[1] monkeypatch I'd get without bitmaps and reported by trace2 / ts: Apr 09 16:45:15 16:45:15.365817 git.c:433 | d1 | main | cmd_name | | | | | pack-objects (push/pack-objects) Apr 09 16:45:15 16:45:15.366220 builtin/pack-objects.c:3493 | d1 | main | region_enter | r1 | 0.000928 | | pack-objects | label:enumerate-objects Apr 09 16:45:15 16:45:15.366241 builtin/pack-objects.c:3495 | d1 | main | region_enter | r1 | 0.000950 | | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 16:45:15 16:45:15.366384 builtin/pack-objects.c:3498 | d1 | main | region_leave | r1 | 0.001091 | 0.000141 | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 16:45:15 16:45:15.366394 builtin/pack-objects.c:3510 | d1 | main | region_enter | r1 | 0.001102 | | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 16:45:15 get obj list 1 Apr 09 16:45:15 get obj list 2, did 29391 lines Apr 09 16:45:15 get obj list 3 Apr 09 16:45:15 get obj list 4 Apr 09 16:45:15 get obj list 5 Apr 09 16:45:15 get obj list 6 Apr 09 16:45:15 get obj list 7 Apr 09 16:45:15 get obj list 8 Apr 09 16:45:15 get obj list 9 Apr 09 16:45:15 16:45:15.776559 builtin/pack-objects.c:3514 | d1 | main | region_leave | r1 | 0.411263 | 0.410161 | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 16:45:15 16:45:15.776577 builtin/pack-objects.c:3517 | d1 | main | region_enter | r1 | 0.411285 | | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 16:45:15 16:45:15.776584 builtin/pack-objects.c:3520 | d1 | main | region_leave | r1 | 0.411292 | 0.000007 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 16:45:15 16:45:15.776605 builtin/pack-objects.c:3530 | d1 | main | region_leave | r1 | 0.411313 | 0.410385 | pack-objects | label:enumerate-objects Apr 09 16:45:15 16:45:15.776609 builtin/pack-objects.c:3542 | d1 | main | region_enter | r1 | 0.411318 | | pack-objects | label:write-pack-file Apr 09 16:45:15 16:45:15.794235 builtin/pack-objects.c:3544 | d1 | main | region_leave | r1 | 0.428942 | 0.017624 | pack-objects | label:write-pack-file But with pack.useBitmaps=true: Apr 09 16:39:59 16:39:59.139022 git.c:433 | d1 | main | cmd_name | | | | | pack-objects (push/pack-objects) Apr 09 16:39:59 16:39:59.139398 builtin/pack-objects.c:3493 | d1 | main | region_enter | r1 | 0.000869 | | pack-objects | label:enumerate-objects Apr 09 16:39:59 16:39:59.139419 builtin/pack-objects.c:3495 | d1 | main | region_enter | r1 | 0.000892 | | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 16:39:59 16:39:59.139551 builtin/pack-objects.c:3498 | d1 | main | region_leave | r1 | 0.001023 | 0.000131 | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 16:39:59 16:39:59.139559 builtin/pack-objects.c:3510 | d1 | main | region_enter | r1 | 0.001032 | | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 16:39:59 get obj list 1 Apr 09 16:39:59 get obj list 2, did 29392 lines Apr 09 16:39:59 get obj list 3 Apr 09 16:39:59 prepping walk Apr 09 16:39:59 opening packed bitmap... Apr 09 16:39:59 opening packed bitmap done Apr 09 16:39:59 walking 29392 pending Apr 09 16:39:59 done walking 29392 pending Apr 09 16:39:59 prepare_bitmap_walk 3 Apr 09 16:39:59 prepare_bitmap_walk 4 Apr 09 16:39:59 prepare_bitmap_walk 5 Apr 09 16:40:00 prepare_bitmap_walk 6 Apr 09 16:40:00 prepare_bitmap_walk 6.1 Apr 09 16:41:35 prepare_bitmap_walk 6.2 Apr 09 16:41:35 prepare_bitmap_walk 7 Apr 09 16:41:52 prepare_bitmap_walk 8 Apr 09 16:41:52 walking? Apr 09 16:41:52 traversing Apr 09 16:41:52 traversing done Apr 09 16:41:52 16:41:52.091634 builtin/pack-objects.c:3514 | d1 | main | region_leave | r1 | 112.953099 | 112.952067 | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 16:41:52 16:41:52.091655 builtin/pack-objects.c:3517 | d1 | main | region_enter | r1 | 112.953128 | | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 16:41:52 16:41:52.091668 builtin/pack-objects.c:3520 | d1 | main | region_leave | r1 | 112.953141 | 0.000013 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 16:41:52 16:41:52.091700 builtin/pack-objects.c:3530 | d1 | main | region_leave | r1 | 112.953172 | 112.952303 | pack-objects | label:enumerate-objects Apr 09 16:41:52 16:41:52.091706 builtin/pack-objects.c:3542 | d1 | main | region_enter | r1 | 112.953179 | | pack-objects | label:write-pack-file Apr 09 16:41:52 16:41:52.111966 builtin/pack-objects.c:3544 | d1 | main | region_leave | r1 | 112.973438 | 0.020259 | pack-objects | label:write-pack-file I.e. almost all the time is in get_object_list_from_bitmap() and around 1m30s in just this in pack-bitmap.c: haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); And then another ~20s in: wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); This is with 10 packs and where only the largest (initial clone pack) had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b', i.e. with only one pack with a *.bitmap, although that makes it a bit better for the first bit, and almost completely cuts down on the time spent in the second phase: Apr 09 17:08:37 17:08:37.261507 builtin/pack-objects.c:3493 | d1 | main | region_enter | r1 | 0.000922 | | pack-objects | label:enumerate-objects Apr 09 17:08:37 17:08:37.261527 builtin/pack-objects.c:3495 | d1 | main | region_enter | r1 | 0.000943 | | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 17:08:37 17:08:37.261600 builtin/pack-objects.c:3498 | d1 | main | region_leave | r1 | 0.001015 | 0.000072 | pack-objects | ..label:enumerate-objects-prepare-packing-data Apr 09 17:08:37 17:08:37.261608 builtin/pack-objects.c:3510 | d1 | main | region_enter | r1 | 0.001024 | | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 17:08:37 get obj list 1 Apr 09 17:08:37 get obj list 2, did 29380 lines Apr 09 17:08:37 get obj list 3 Apr 09 17:08:37 prepping walk Apr 09 17:08:37 opening packed bitmap... Apr 09 17:08:37 opening packed bitmap done Apr 09 17:08:37 walking 29380 pending Apr 09 17:08:37 done walking 29380 pending Apr 09 17:08:37 prepare_bitmap_walk 3 Apr 09 17:08:37 prepare_bitmap_walk 4 Apr 09 17:08:37 prepare_bitmap_walk 5 Apr 09 17:08:38 prepare_bitmap_walk 6 Apr 09 17:08:38 prepare_bitmap_walk 6.1 Apr 09 17:09:07 prepare_bitmap_walk 6.2 Apr 09 17:09:07 prepare_bitmap_walk 7 Apr 09 17:09:09 prepare_bitmap_walk 8 Apr 09 17:09:09 walking? Apr 09 17:09:09 traversing Apr 09 17:09:09 traversing done Apr 09 17:09:09 17:09:09.229185 builtin/pack-objects.c:3514 | d1 | main | region_leave | r1 | 31.968595 | 31.967571 | pack-objects | ..label:enumerate-objects-get-obj-list Apr 09 17:09:09 17:09:09.229203 builtin/pack-objects.c:3517 | d1 | main | region_enter | r1 | 31.968619 | | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 17:09:09 17:09:09.229214 builtin/pack-objects.c:3520 | d1 | main | region_leave | r1 | 31.968630 | 0.000011 | pack-objects | ..label:enumerate-objects-cleanup-preferred-base Apr 09 17:09:09 17:09:09.229242 builtin/pack-objects.c:3530 | d1 | main | region_leave | r1 | 31.968658 | 31.967736 | pack-objects | label:enumerate-objects Apr 09 17:09:09 17:09:09.229265 builtin/pack-objects.c:3542 | d1 | main | region_enter | r1 | 31.968681 | | pack-objects | label:write-pack-file Apr 09 17:09:09 17:09:09.251998 builtin/pack-objects.c:3544 | d1 | main | region_leave | r1 | 31.991412 | 0.022731 | pack-objects | label:write-pack-file I don't have time to dig more into this now, just wanted to send these initial results... 1. diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index a154fc29f6..8b2af1740e 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -3052,2 +3052,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs) { + fprintf(stderr, "prepping walk\n"); if (!(bitmap_git = prepare_bitmap_walk(revs))) @@ -3055,2 +3056,3 @@ static int get_object_list_from_bitmap(struct rev_info *revs) + fprintf(stderr, "walking?\n"); if (pack_options_allow_reuse() && @@ -3066,3 +3068,5 @@ static int get_object_list_from_bitmap(struct rev_info *revs) + fprintf(stderr, "traversing\n"); traverse_bitmap_commit_list(bitmap_git, &add_object_entry_from_bitmap); + fprintf(stderr, "traversing done\n"); return 0; @@ -3091,2 +3095,3 @@ static void get_object_list(int ac, const char **av) int save_warning; + int lns = 0; @@ -3102,3 +3107,5 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 1\n"); while (fgets(line, sizeof(line), stdin) != NULL) { + lns++; int len = strlen(line); @@ -3128,4 +3135,6 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 2, did %d lines\n", lns); warn_on_object_refname_ambiguity = save_warning; + fprintf(stderr, "get obj list 3\n"); if (use_bitmap_index && !get_object_list_from_bitmap(&revs)) @@ -3133,2 +3142,3 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 4\n"); if (use_delta_islands) @@ -3136,2 +3146,3 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 5\n"); if (prepare_revision_walk(&revs)) @@ -3140,2 +3151,3 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 6\n"); if (!fn_show_object) @@ -3146,2 +3158,3 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 7\n"); if (unpack_unreachable_expiration) { @@ -3157,2 +3170,3 @@ static void get_object_list(int ac, const char **av) + fprintf(stderr, "get obj list 8\n"); if (keep_unreachable) @@ -3163,2 +3177,3 @@ static void get_object_list(int ac, const char **av) loosen_unused_packed_objects(&revs); + fprintf(stderr, "get obj list 9\n"); @@ -3478,3 +3493,7 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) the_repository); + trace2_region_enter("pack-objects", "enumerate-objects-prepare-packing-data", + the_repository); prepare_packing_data(the_repository, &to_pack); + trace2_region_leave("pack-objects", "enumerate-objects-prepare-packing-data", + the_repository); @@ -3482,11 +3501,28 @@ int cmd_pack_objects(int argc, const char **argv, const char *prefix) progress_state = start_progress(_("Enumerating objects"), 0); - if (!use_internal_rev_list) + if (!use_internal_rev_list) { + trace2_region_enter("pack-objects", "enumerate-objects-read-stdin", + the_repository); read_object_list_from_stdin(); - else { + trace2_region_leave("pack-objects", "enumerate-objects-read-stdin", + the_repository); + } else { + trace2_region_enter("pack-objects", "enumerate-objects-get-obj-list", + the_repository); get_object_list(rp.argc, rp.argv); argv_array_clear(&rp); + trace2_region_leave("pack-objects", "enumerate-objects-get-obj-list", + the_repository); } + trace2_region_enter("pack-objects", "enumerate-objects-cleanup-preferred-base", + the_repository); cleanup_preferred_base(); - if (include_tag && nr_result) + trace2_region_leave("pack-objects", "enumerate-objects-cleanup-preferred-base", + the_repository); + if (include_tag && nr_result) { + trace2_region_enter("pack-objects", "enumerate-objects-add-tags", + the_repository); for_each_ref(add_ref_tag, NULL); + trace2_region_leave("pack-objects", "enumerate-objects-add-tags", + the_repository); + } stop_progress(&progress_state); diff --git a/pack-bitmap.c b/pack-bitmap.c index 4695aaf6b4..0ab71597fd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -693,5 +693,8 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) * because we may not need to use it */ + fprintf(stderr, "opening packed bitmap...\n"); if (open_pack_bitmap(revs->repo, bitmap_git) < 0) goto cleanup; + fprintf(stderr, "opening packed bitmap done\n"); + fprintf(stderr, "walking %d pending\n", revs->pending.nr); for (i = 0; i < revs->pending.nr; ++i) { @@ -720,2 +723,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) } + fprintf(stderr, "done walking %d pending\n", revs->pending.nr); @@ -726,2 +730,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) */ + fprintf(stderr, "prepare_bitmap_walk 3\n"); if (haves && !in_bitmapped_pack(bitmap_git, haves)) @@ -729,2 +734,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) + fprintf(stderr, "prepare_bitmap_walk 4\n"); /* if we don't want anything, we're done here */ @@ -738,2 +744,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) */ + fprintf(stderr, "prepare_bitmap_walk 5\n"); if (load_pack_bitmap(bitmap_git) < 0) @@ -741,2 +748,3 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) + fprintf(stderr, "prepare_bitmap_walk 6\n"); object_array_clear(&revs->pending); @@ -745,3 +753,5 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) revs->ignore_missing_links = 1; + fprintf(stderr, "prepare_bitmap_walk 6.1\n"); haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); + fprintf(stderr, "prepare_bitmap_walk 6.2\n"); reset_revision_walk(); @@ -752,4 +762,6 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) } + fprintf(stderr, "prepare_bitmap_walk 7\n"); wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); + fprintf(stderr, "prepare_bitmap_walk 8\n"); diff --git a/revision.c b/revision.c index eb8e51bc63..4592d01ee7 100644 --- a/revision.c +++ b/revision.c @@ -63,2 +63,4 @@ static void mark_tree_contents_uninteresting(struct repository *r, + fprintf(stderr, "MTCU\n"); + if (parse_tree_gently(tree, 1) < 0) @@ -167,2 +169,4 @@ static void add_children_by_path(struct repository *r, + fprintf(stderr, "ACBP\n"); + if (!tree)
On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote: > I've found a case where turning bitmaps on does horrible things for > bitmap "push" performance. > [...] > I can't share the repo, but I had a report where just a "git push" of a > topic branch that was 2/58 ahead/behind took ~2 minutes just in > "Enumerating objects", but ~500ms without bitmaps. That's pretty bad, though I'm not terribly surprised. The worst cases are ones where we have to traverse a lot to fill in the bitmap. So either there are a lot of commits newer than the bitmapped pack, or we've done a bad job of picking old commits to bitmap, requiring us to walk back to find all of the reachable objects (until we find something that does have a bitmap). And "bad" here is somewhat subjective. The other side told us some "have" lines, and those are what we have to walk back from. _Usually_ those would correspond to ref tips, and the bitmap code tries to put a bitmap at each ref tip. But if you have tons of refs, it can't always do so. > I.e. almost all the time is in get_object_list_from_bitmap() and around > 1m30s in just this in pack-bitmap.c: > > haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); > > And then another ~20s in: > > wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); Yeah, that's where I'd expect the time to go. Inside find_objects() we'll call traverse_commit_list() to do the actual walk. The first walk for "haves" is looking mostly at old commits. Stuff that is mentioned in the pack, but for which we have to walk to find the bitmap. The second walk for wants is probably mostly looking at new commits. Things that don't have a bitmap yet, and for which we have to walk (though it's interesting that it's so much more expensive than the non-bitmap walk, which has to fully enumerate those trees itself; that implies that some of the "haves" are recent, too, with respect to the last pack). > This is with 10 packs and where only the largest (initial clone pack) > had a *.bitmap, but I can also reproduce with a 'git repack -A -d -b', > i.e. with only one pack with a *.bitmap, although that makes it a bit > better for the first bit, and almost completely cuts down on the time > spent in the second phase: Yeah, that makes sense. By repacking you've taken all those new commits and included them in on-disk bitmaps. So I'd expect the "wants" to get much shorter, but the "haves" phase staying long means we could do a better job of picking commits to have on-disk bitmaps. So two avenues for exploration I think: 1. I've long suspected that the bitmap selection code isn't ideal. Both in terms of what it picks, but also in its runtime (I think it ends up walking the same slices of history multiple times in some cases). 2. The answer we get from a bitmap versus a regular traversal are not apples-to-apples equivalent. The regular traversal walks down to the UNINTERESTING commits, marks the boundaries trees and blobs as UNINTERESTING, and then adds in all the interesting trees and blobs minus the UNINTERESTING parts. So it can sometimes give the wrong answer, claiming something is interesting when it is not. Whereas with bitmaps we fill in the trees and blobs as we walk, and you get the true answer. But it means we may open up a lot more trees than the equivalent traversal would. So one thing I've never really experimented with (and indeed, never really thought about until writing this email) is that the bitmaps could try to do that looser style of traversal, knowing that we might err on the side of calling things interesting in a few cases. But hopefully spending a lot less time opening trees. I'm not even 100% sure what that would look like in code, but just thinking about it from a high level, I don't there's a particular mathematical reason it couldn't work. -Peff
Jeff King <peff@peff.net> writes: > On Tue, Apr 09, 2019 at 05:10:43PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I've found a case where turning bitmaps on does horrible things for >> bitmap "push" performance. >> [...] >> I can't share the repo, but I had a report where just a "git push" of a >> topic branch that was 2/58 ahead/behind took ~2 minutes just in >> "Enumerating objects", but ~500ms without bitmaps. > > That's pretty bad, though I'm not terribly surprised. I was revisiting the recent "What's cooking" report, and I am not sure what the current status of the topic is. I do not get a feel that the current bitmap implementation has been widely used in repositories that have vastly different access patterns---it probably has been tried only by those who can afford the engineering cost to see if the implementation happens to work well for their workload and some may have chosen to adopt it while others didn't. So it may be very well tuned for the former people but once we merge this topic down, we'll hear from others with quite different workload, which may lead to us tuning the code to bit better to their workload while not hurting other existing users, hopefully. Or not. I am somewhat tempted to make things more exciting by merging it to 'next' soonish, but I guess Ævar and you are not quite ready for that excitement yet, judging from the following (which looks quite sensible suggestions)? Thanks. > Yeah, that makes sense. By repacking you've taken all those new commits > and included them in on-disk bitmaps. So I'd expect the "wants" to get > much shorter, but the "haves" phase staying long means we could do a > better job of picking commits to have on-disk bitmaps. > > So two avenues for exploration I think: > > 1. I've long suspected that the bitmap selection code isn't ideal. > Both in terms of what it picks, but also in its runtime (I think it > ends up walking the same slices of history multiple times in some > cases). > > 2. The answer we get from a bitmap versus a regular traversal are not > apples-to-apples equivalent. The regular traversal walks down to > the UNINTERESTING commits, marks the boundaries trees and blobs as > UNINTERESTING, and then adds in all the interesting trees and blobs > minus the UNINTERESTING parts. So it can sometimes give the wrong > answer, claiming something is interesting when it is not. > > Whereas with bitmaps we fill in the trees and blobs as we walk, and > you get the true answer. But it means we may open up a lot more > trees than the equivalent traversal would. > > So one thing I've never really experimented with (and indeed, never > really thought about until writing this email) is that the bitmaps > could try to do that looser style of traversal, knowing that we > might err on the side of calling things interesting in a few cases. > But hopefully spending a lot less time opening trees. > > I'm not even 100% sure what that would look like in code, but just > thinking about it from a high level, I don't there's a particular > mathematical reason it couldn't work. > > -Peff
On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote: > I was revisiting the recent "What's cooking" report, and I am not > sure what the current status of the topic is. > > I do not get a feel that the current bitmap implementation has been > widely used in repositories that have vastly different access > patterns---it probably has been tried only by those who can afford > the engineering cost to see if the implementation happens to work > well for their workload and some may have chosen to adopt it while > others didn't. So it may be very well tuned for the former people > but once we merge this topic down, we'll hear from others with quite > different workload, which may lead to us tuning the code to bit > better to their workload while not hurting other existing users, > hopefully. > > Or not. Note that Ævar's case was somebody running bitmaps locally and trying to push, which I think is generally not a good match for bitmaps (even when they work, they cost more to generate than what you save if you're only pushing once). The goal of Eric's patch was that by kicking in for bare repos, we'd mostly be hitting servers that are serving up fetches. So if by "workload" you mean that we some people might use bare repos for other cases, yeah, there's a potential for confusion or regression there. If you mean that bitmaps might not work for some workloads even when we're serving a lot of fetches, I won't say that's _not_ true, but my experience is that they are generally a net win. Both for the smaller repositories we see on github.com, but also for big, busy ones that our on-premises customers use. Actually, there is one curiosity with Eric's patch that I haven't tested. As I've mentioned before, we store "forks" as single repositories pointing to a single shared alternates repository. Since the bitmap code only handles one .bitmap per invocation, you really want just one big one in the shared repo. If "git repack" in the forks started generating one, that would be surprising and annoying. In practice this is a pretty extreme corner case. And a lot would depend on how you're using "repack" in the fork (e.g., a partial repack would know that it can't generate bitmaps anyway). I'm pretty sure it would not even impact our setup at all, but I can probably come up with a devils advocate one where it would. > I am somewhat tempted to make things more exciting by merging it to > 'next' soonish, but I guess Ævar and you are not quite ready for > that excitement yet, judging from the following (which looks quite > sensible suggestions)? It's OK with me for this to go to 'next'. Note that the other two patches from me could actually graduate separately. One is a straight-out test fix, and the other should always be a win (and does nothing if you're not already generating bitmaps). By the way, there were some timing puzzles mentioned in that second commit. I re-ran them today and everything was what I'd expect. So I wonder if I just screwed up the timings before. I can re-write that commit message if it hasn't made it to 'next' yet. -Peff
On Sat, May 04 2019, Jeff King wrote: > On Thu, Apr 25, 2019 at 04:16:46PM +0900, Junio C Hamano wrote: > >> I was revisiting the recent "What's cooking" report, and I am not >> sure what the current status of the topic is. >> >> I do not get a feel that the current bitmap implementation has been >> widely used in repositories that have vastly different access >> patterns---it probably has been tried only by those who can afford >> the engineering cost to see if the implementation happens to work >> well for their workload and some may have chosen to adopt it while >> others didn't. So it may be very well tuned for the former people >> but once we merge this topic down, we'll hear from others with quite >> different workload, which may lead to us tuning the code to bit >> better to their workload while not hurting other existing users, >> hopefully. >> >> Or not. > > Note that Ævar's case was somebody running bitmaps locally and trying to > push, which I think is generally not a good match for bitmaps (even when > they work, they cost more to generate than what you save if you're only > pushing once). Right. It was *not* caused by this "enable bitmaps by default on bare repos" patch (which I wasn't even running with), but *is* indicative of a pretty big edge case with enabling bitmaps that *will* happen for some on such bare repos if we ship the patch. > The goal of Eric's patch was that by kicking in for bare repos, we'd > mostly be hitting servers that are serving up fetches. So if by > "workload" you mean that we some people might use bare repos for other > cases, yeah, there's a potential for confusion or regression there. As noted I suspect that's a really rare use-case in practice, and in reply to Junio's <xmqq1s1qy2ox.fsf@gitster-ct.c.googlers.com> upthread I think it's fine to just try this. Maybe we'll finally get such use-cases out of the woodworks by turning it on by default. As an aside this is the Nth time I notice how crappy that "Enumerating objects" progress bar is. We do a *lot* of things there, including this bitmap calculation. But just splitting it up might result in either no progress (all individually below 2 seconds), or a lot of noise as you have 20 things that each take 2 seconds. I wonder if someone's looked at supporting: Enumerating Objects (X%) => Calculating bitmaps (Y%) Where X% is the total progres, and %Y is the sub-progress. I eyeballed doing this once by "chaining" the progress structs, but there's probably a less crappy way... > If you mean that bitmaps might not work for some workloads even when > we're serving a lot of fetches, I won't say that's _not_ true, but my > experience is that they are generally a net win. Both for the smaller > repositories we see on github.com, but also for big, busy ones that our > on-premises customers use. > > Actually, there is one curiosity with Eric's patch that I haven't > tested. As I've mentioned before, we store "forks" as single > repositories pointing to a single shared alternates repository. Since > the bitmap code only handles one .bitmap per invocation, you really > want just one big one in the shared repo. If "git repack" in the forks > started generating one, that would be surprising and annoying. > > In practice this is a pretty extreme corner case. And a lot would > depend on how you're using "repack" in the fork (e.g., a partial > repack would know that it can't generate bitmaps anyway). I'm pretty > sure it would not even impact our setup at all, but I can probably > come up with a devils advocate one where it would. > >> I am somewhat tempted to make things more exciting by merging it to >> 'next' soonish, but I guess Ævar and you are not quite ready for >> that excitement yet, judging from the following (which looks quite >> sensible suggestions)? > > It's OK with me for this to go to 'next'. Note that the other two > patches from me could actually graduate separately. One is a > straight-out test fix, and the other should always be a win (and does > nothing if you're not already generating bitmaps). > > By the way, there were some timing puzzles mentioned in that second > commit. I re-ran them today and everything was what I'd expect. So I > wonder if I just screwed up the timings before. I can re-write that > commit message if it hasn't made it to 'next' yet. > > -Peff
On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote: > As an aside this is the Nth time I notice how crappy that "Enumerating > objects" progress bar is. And don't forget that the commit-graph progress bar still prints nonsense :) https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/
On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote: > > Note that Ævar's case was somebody running bitmaps locally and trying to > > push, which I think is generally not a good match for bitmaps (even when > > they work, they cost more to generate than what you save if you're only > > pushing once). > > Right. It was *not* caused by this "enable bitmaps by default on bare > repos" patch (which I wasn't even running with), but *is* indicative of > a pretty big edge case with enabling bitmaps that *will* happen for some > on such bare repos if we ship the patch. Yeah. To clarify my comments a bit: I do think it would be possible to hit a weird case like this while serving fetches (i.e., as far as I know there is nothing in what you saw that is inherent to pushes). But I do think for serving fetches, bitmaps are overall a big net win (based on my experiences). So I think it may come down to a tradeoff: enabling this by default would probably be a net win across the population, but that's little comfort to the unlucky somebody who may see it as a regression. I'm not sure which is more important to maintain. > As an aside this is the Nth time I notice how crappy that "Enumerating > objects" progress bar is. We do a *lot* of things there, including this > bitmap calculation. > > But just splitting it up might result in either no progress (all > individually below 2 seconds), or a lot of noise as you have 20 things > that each take 2 seconds. I wonder if someone's looked at supporting: > > Enumerating Objects (X%) => Calculating bitmaps (Y%) > > Where X% is the total progres, and %Y is the sub-progress. I eyeballed > doing this once by "chaining" the progress structs, but there's probably > a less crappy way... I don't think it needs to be split; I think we just need to update the object count while we're traversing the bitmaps. The problem is that the progress object is known to pack-objects.c. Without bitmaps, as the revision machinery walks the graph, our callbacks bump the progress meter every time we see an object. With bitmaps, all that walking happens behind the scenes, and the bitmap code delivers us the final answer. So we pause for a long time, and then suddenly it shoots forward. I think we'd want a way to tell the bitmap code to update our progress meter as it traverses (both single objects, but also taking into account when it finds a bitmap and then suddenly bumps the value by a large amount). -Peff
On Tue, May 07 2019, Jeff King wrote: > On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Note that Ævar's case was somebody running bitmaps locally and trying to >> > push, which I think is generally not a good match for bitmaps (even when >> > they work, they cost more to generate than what you save if you're only >> > pushing once). >> >> Right. It was *not* caused by this "enable bitmaps by default on bare >> repos" patch (which I wasn't even running with), but *is* indicative of >> a pretty big edge case with enabling bitmaps that *will* happen for some >> on such bare repos if we ship the patch. > > Yeah. To clarify my comments a bit: I do think it would be possible to > hit a weird case like this while serving fetches (i.e., as far as I know > there is nothing in what you saw that is inherent to pushes). But I do > think for serving fetches, bitmaps are overall a big net win (based on > my experiences). > > So I think it may come down to a tradeoff: enabling this by default > would probably be a net win across the population, but that's little > comfort to the unlucky somebody who may see it as a regression. I'm not > sure which is more important to maintain. > >> As an aside this is the Nth time I notice how crappy that "Enumerating >> objects" progress bar is. We do a *lot* of things there, including this >> bitmap calculation. >> >> But just splitting it up might result in either no progress (all >> individually below 2 seconds), or a lot of noise as you have 20 things >> that each take 2 seconds. I wonder if someone's looked at supporting: >> >> Enumerating Objects (X%) => Calculating bitmaps (Y%) >> >> Where X% is the total progres, and %Y is the sub-progress. I eyeballed >> doing this once by "chaining" the progress structs, but there's probably >> a less crappy way... > > I don't think it needs to be split; I think we just need to update the > object count while we're traversing the bitmaps. The problem is that the > progress object is known to pack-objects.c. Without bitmaps, as the > revision machinery walks the graph, our callbacks bump the progress > meter every time we see an object. > > With bitmaps, all that walking happens behind the scenes, and the bitmap > code delivers us the final answer. So we pause for a long time, and then > suddenly it shoots forward. > > I think we'd want a way to tell the bitmap code to update our progress > meter as it traverses (both single objects, but also taking into account > when it finds a bitmap and then suddenly bumps the value by a large > amount). Not splitting it will fix the progress bar stalling, so it fixes the problem that the user is wondering if the command is entirely hanging. But I was hoping to give the user an idea of roughly where we're spending our time, e.g. so you can see how much the pack.useSparse setting is helping (or not). So something where we report sub-progress as we go along, and perhaps print some brief summary at the end if it took long enough, e.g.: Enumerating Objects (X^1%) => Marking trees (Y^1%) Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%) And at the end: Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other) I.e. bringing the whole "nested" trace2 regions full circle with the progress bar where we could elect to trace/show some of that info, and then you could turn on some trace2 mode/verbose progress to see more.
On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote: > > I think we'd want a way to tell the bitmap code to update our progress > > meter as it traverses (both single objects, but also taking into account > > when it finds a bitmap and then suddenly bumps the value by a large > > amount). > > Not splitting it will fix the progress bar stalling, so it fixes the > problem that the user is wondering if the command is entirely hanging. > > But I was hoping to give the user an idea of roughly where we're > spending our time, e.g. so you can see how much the pack.useSparse > setting is helping (or not). Yeah, I think that's a bigger and more complicated problem. I admit that my main annoyance is just the stall while we fill in the bitmaps (and it's easy because the bitmap traversal is the same unit of work as a regular traversal). > So something where we report sub-progress as we go along, and perhaps > print some brief summary at the end if it took long enough, e.g.: > > Enumerating Objects (X^1%) => Marking trees (Y^1%) > Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%) > > And at the end: > > Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other) > > I.e. bringing the whole "nested" trace2 regions full circle with the > progress bar where we could elect to trace/show some of that info, and > then you could turn on some trace2 mode/verbose progress to see more. I do wonder if this really needs to be part of the progress bar. The goal of the progress bar is to give the user a sense that work is happening, and (if possible, but not for "enumerating") an idea of when it might finish. If the trace code can already do detailed timings, then shouldn't we just be encouraging people to use that? -Peff
On 5/8/2019 3:11 AM, Jeff King wrote: > On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote: > >>> I think we'd want a way to tell the bitmap code to update our progress >>> meter as it traverses (both single objects, but also taking into account >>> when it finds a bitmap and then suddenly bumps the value by a large >>> amount). >> >> Not splitting it will fix the progress bar stalling, so it fixes the >> problem that the user is wondering if the command is entirely hanging. >> >> But I was hoping to give the user an idea of roughly where we're >> spending our time, e.g. so you can see how much the pack.useSparse >> setting is helping (or not). > > Yeah, I think that's a bigger and more complicated problem. I admit that > my main annoyance is just the stall while we fill in the bitmaps (and > it's easy because the bitmap traversal is the same unit of work as a > regular traversal). The pack.useSparse setting also speeds up a section that is not marked by progress: that portion usually is walking all UNINTERESTING trees and the"Enumerating Objects" progress is just for walking the INTERESTING objects. >> So something where we report sub-progress as we go along, and perhaps >> print some brief summary at the end if it took long enough, e.g.: >> >> Enumerating Objects (X^1%) => Marking trees (Y^1%) >> Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%) I like this idea for splitting the "normal" mechanism, too: Enumerating Objects (X^1%) => Marking trees (Y^1%) Enumerating Objects (X^2%) => Enumerating objects to pack (Y^2%) >> I.e. bringing the whole "nested" trace2 regions full circle with the >> progress bar where we could elect to trace/show some of that info, and >> then you could turn on some trace2 mode/verbose progress to see more. > > I do wonder if this really needs to be part of the progress bar. The > goal of the progress bar is to give the user a sense that work is > happening, and (if possible, but not for "enumerating") an idea of when > it might finish. If the trace code can already do detailed timings, then > shouldn't we just be encouraging people to use that? The problem I've seen (without bitmaps) is that running `git push` can take a while before _any_ progress is listed. Good news is: `pack.useSparse` fixed our push problem in the Windows OS repo. The end-to-end time for `git push` sped up by 7.7x with the change, and this "blank" time is too fast for users to notice. Updating the progress could help in cases without pack.useSparse. Thanks, -Stolee
On Wed, May 08 2019, Jeff King wrote: > On Tue, May 07, 2019 at 10:12:06AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > I think we'd want a way to tell the bitmap code to update our progress >> > meter as it traverses (both single objects, but also taking into account >> > when it finds a bitmap and then suddenly bumps the value by a large >> > amount). >> >> Not splitting it will fix the progress bar stalling, so it fixes the >> problem that the user is wondering if the command is entirely hanging. >> >> But I was hoping to give the user an idea of roughly where we're >> spending our time, e.g. so you can see how much the pack.useSparse >> setting is helping (or not). > > Yeah, I think that's a bigger and more complicated problem. I admit that > my main annoyance is just the stall while we fill in the bitmaps (and > it's easy because the bitmap traversal is the same unit of work as a > regular traversal). > >> So something where we report sub-progress as we go along, and perhaps >> print some brief summary at the end if it took long enough, e.g.: >> >> Enumerating Objects (X^1%) => Marking trees (Y^1%) >> Enumerating Objects (X^2%) => Calculating bitmaps (Y^2%) >> >> And at the end: >> >> Enumerating Objects (100%) in ~2m30s -- (~10s marking trees, ~2m10s bitmaps, ~10s other) >> >> I.e. bringing the whole "nested" trace2 regions full circle with the >> progress bar where we could elect to trace/show some of that info, and >> then you could turn on some trace2 mode/verbose progress to see more. > > I do wonder if this really needs to be part of the progress bar. The > goal of the progress bar is to give the user a sense that work is > happening, and (if possible, but not for "enumerating") an idea of when > it might finish. If the trace code can already do detailed timings, then > shouldn't we just be encouraging people to use that? To just show work happening we could save ourselves some horizontal space and the debates over counting v.s. enumerating with: diff --git a/progress.c b/progress.c index 0318bdd41b..83336ca391 100644 --- a/progress.c +++ b/progress.c @@ -226,3 +226,3 @@ static struct progress *start_progress_delay(const char *title, uint64_t total, struct progress *progress = xmalloc(sizeof(*progress)); - progress->title = title; + progress->title = "Reticulating splines"; progress->total = total; :) Obviously that's silly, but the point is we do show some user messaging with these now, and e.g. the other day here on-list (can't be bothered to find the msgid) someone was lamenting that the N progressbars we show on "push" were too verbose. So by coalescing some of the existing bars that do one logical operation (push) in N steps we could be less verbose without losing the "stuff's happening" part of it, and would see if something odd was going on, e.g. the "I/O write" part being proportionally slower on this box than the other, or when they upgrade bitmaps suddenly showing up as >95% of the time. The bit I find interesting about tying it into trace2 is that once you do that the trace logs can contain e.g. min/max/avg/median/percentile time for doing some operation we can break into N steps same/similar steps, which might be interesting for performance analysis.
On Sat, May 04 2019, SZEDER Gábor wrote: > On Sat, May 04, 2019 at 08:52:01AM +0200, Ævar Arnfjörð Bjarmason wrote: >> As an aside this is the Nth time I notice how crappy that "Enumerating >> objects" progress bar is. > > And don't forget that the commit-graph progress bar still prints > nonsense :) > > https://public-inbox.org/git/87ef6ydds8.fsf@evledraar.gmail.com/ I forgot for a bit, but then figured I'd get out of Derrick's way with his more significant commit-graph work (which would have inevitably had merge conflicts with this patch), and look at the time. Here we are coming up on rc0. I'm inclined to just wait until Derrick's refactorings land post-2.22.0 unless a) we need it enough before 2.22.0 to cause him trouble b) Junio agrees with "a" and would take such a patch to fix this part of the progress output before 2.22.0. If people (just you would do) tell me "yes I/we want it" and Junio says "yeah I'll do that" I'll write this patch in the next couple of days, otherwise I'll do other stuff. Junio?
On Wed, May 08, 2019 at 06:13:58PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I do wonder if this really needs to be part of the progress bar. The > > goal of the progress bar is to give the user a sense that work is > > happening, and (if possible, but not for "enumerating") an idea of when > > it might finish. If the trace code can already do detailed timings, then > > shouldn't we just be encouraging people to use that? > [...] > - progress->title = title; > + progress->title = "Reticulating splines"; Heh, OK, that's a fair reductio ad absurdum of my point. I guess what I was trying to say is that we don't need to go overboard with accuracy as long as we're giving a vague sense of the work. Precision measurements can be handled through a different system. But I don't mind if you can find a way to do these kind of cascaded progress meters in a way that is pleasing to the eye and not hard to handle in the callers of the code. -Peff
Ævar Arnfjörð Bjarmason <avarab@gmail.com> writes: > On Sat, May 04 2019, SZEDER Gábor wrote: > ... >> And don't forget that the commit-graph progress bar still prints >> nonsense :) > > I'm inclined to just wait until Derrick's refactorings land post-2.22.0 Fine by me. Thanks.
On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote: > 2. The answer we get from a bitmap versus a regular traversal are not > apples-to-apples equivalent. The regular traversal walks down to > the UNINTERESTING commits, marks the boundaries trees and blobs as > UNINTERESTING, and then adds in all the interesting trees and blobs > minus the UNINTERESTING parts. So it can sometimes give the wrong > answer, claiming something is interesting when it is not. > > Whereas with bitmaps we fill in the trees and blobs as we walk, and > you get the true answer. But it means we may open up a lot more > trees than the equivalent traversal would. > > So one thing I've never really experimented with (and indeed, never > really thought about until writing this email) is that the bitmaps > could try to do that looser style of traversal, knowing that we > might err on the side of calling things interesting in a few cases. > But hopefully spending a lot less time opening trees. > > I'm not even 100% sure what that would look like in code, but just > thinking about it from a high level, I don't there's a particular > mathematical reason it couldn't work. I spent a while thinking and experimenting with this tonight. The result is the patch below. Ævar, do you still have a copy of the repo that misbehaved? I'd be curious to see how it fares. Finding the right trees to explore is a little tricky with bitmaps. In a normal traversal, we consider the "edges" to be worth exploring. I.e., the places where an UNINTERESTING commit is the parent of an interesting one. But with bitmaps, we don't have that information in the same way, because we're trying to avoid walking the commits in the first place. So e.g., in a history like this: A--B--C \ D Let's imagine we're computing "C..D", and that D has a bitmap on disk but C does not. When we visit D, we'll see that it has a bitmap, fill in the results in our cumulative "want" bitmap, and then stop walking its parents (because we know everything they could tell us already). Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal traversal, we can't stop walking even though there are only UNINTERESTING commits left, because we have to fill the complete bitmap, including A and B (and you can imagine there might be thousands of ancestors of A, too). The reason is that we skipped adding ancestors of D when we saw its bitmap, so no still_interesting() cutoff will work there. But how do we know that it's worth looking into the tree of "B"? If we assume we visit commits before their ancestors (because of the commit timestamp ordering), then we'd see that "B" is in the "want" bitmap already, which makes it worth visiting its tree. Unfortunately we'd then continue on to "A", and it would _also_ look interesting, because it's also in the "want" bitmap. We don't have an easy way of knowing that "A" is basically already covered by "B", and is therefore not worth pursuing. In this example, we could pass down a bit from B to A as we traverse. But you can imagine another interesting commit branched from A, which _would_ make A's tree worth considering. Fundamentally, as soon as we load a bitmap and stop traversing, we lose all information about the graph structure. So the patch below just looks at every tree that might be worth exploring (so both "A" and "B" here, but not "C"). That shouldn't be any _worse_ than what the current code does (because it looks at all the trees). It appears to behave correctly, at least so far as passing the test suite. Running p5310 and p5311 against git.git and linux.git, it seems to perform worse. I'm not quite sure why that is (i.e., if I screwed up something in the implementation, or if the algorithm is fundamentally worse). There are a lot of rough edges in the patch; I was just trying to see if the idea was even viable. So don't bother reviewing it as a real proposal for inclusion. If you do read it, I'd recommend just looking at the post-image, since it's essentially a rewrite and the diff is a bit messy. -Peff --- pack-bitmap.c | 398 ++++++++++++++++++++++++-------------------------- 1 file changed, 193 insertions(+), 205 deletions(-) diff --git a/pack-bitmap.c b/pack-bitmap.c index 6069b2fe55..4a40f62a38 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -12,6 +12,8 @@ #include "packfile.h" #include "repository.h" #include "object-store.h" +#include "blob.h" +#include "prio-queue.h" /* * An entry on the bitmap index, representing the bitmap for a given @@ -422,180 +424,22 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git, return bitmap_pos + bitmap_git->pack->num_objects; } -struct bitmap_show_data { - struct bitmap_index *bitmap_git; - struct bitmap *base; -}; - -static void show_object(struct object *object, const char *name, void *data_) -{ - struct bitmap_show_data *data = data_; - int bitmap_pos; - - bitmap_pos = bitmap_position(data->bitmap_git, &object->oid); - - if (bitmap_pos < 0) - bitmap_pos = ext_index_add_object(data->bitmap_git, object, - name); - - bitmap_set(data->base, bitmap_pos); -} - -static void show_commit(struct commit *commit, void *data) -{ -} - -static int add_to_include_set(struct bitmap_index *bitmap_git, - struct include_data *data, - const struct object_id *oid, - int bitmap_pos) -{ - khiter_t hash_pos; - - if (data->seen && bitmap_get(data->seen, bitmap_pos)) - return 0; - - if (bitmap_get(data->base, bitmap_pos)) - return 0; - - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); - if (hash_pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); - return 0; - } - - bitmap_set(data->base, bitmap_pos); - return 1; -} - -static int should_include(struct commit *commit, void *_data) -{ - struct include_data *data = _data; - int bitmap_pos; - - bitmap_pos = bitmap_position(data->bitmap_git, &commit->object.oid); - if (bitmap_pos < 0) - bitmap_pos = ext_index_add_object(data->bitmap_git, - (struct object *)commit, - NULL); - - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, - bitmap_pos)) { - struct commit_list *parent = commit->parents; - - while (parent) { - parent->item->object.flags |= SEEN; - parent = parent->next; - } - - return 0; - } - - return 1; -} - -static struct bitmap *find_objects(struct bitmap_index *bitmap_git, - struct rev_info *revs, - struct object_list *roots, - struct bitmap *seen) +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, + struct bitmap **base, struct commit *commit) { - struct bitmap *base = NULL; - int needs_walk = 0; - - struct object_list *not_mapped = NULL; - - /* - * Go through all the roots for the walk. The ones that have bitmaps - * on the bitmap index will be `or`ed together to form an initial - * global reachability analysis. - * - * The ones without bitmaps in the index will be stored in the - * `not_mapped_list` for further processing. - */ - while (roots) { - struct object *object = roots->item; - roots = roots->next; - - if (object->type == OBJ_COMMIT) { - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); - - if (pos < kh_end(bitmap_git->bitmaps)) { - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - - if (base == NULL) - base = ewah_to_bitmap(or_with); - else - bitmap_or_ewah(base, or_with); - - object->flags |= SEEN; - continue; - } - } - - object_list_insert(object, ¬_mapped); - } - - /* - * Best case scenario: We found bitmaps for all the roots, - * so the resulting `or` bitmap has the full reachability analysis - */ - if (not_mapped == NULL) - return base; - - roots = not_mapped; - - /* - * Let's iterate through all the roots that don't have bitmaps to - * check if we can determine them to be reachable from the existing - * global bitmap. - * - * If we cannot find them in the existing global bitmap, we'll need - * to push them to an actual walk and run it until we can confirm - * they are reachable - */ - while (roots) { - struct object *object = roots->item; - int pos; - - roots = roots->next; - pos = bitmap_position(bitmap_git, &object->oid); - - if (pos < 0 || base == NULL || !bitmap_get(base, pos)) { - object->flags &= ~UNINTERESTING; - add_pending_object(revs, object, ""); - needs_walk = 1; - } else { - object->flags |= SEEN; - } - } - - if (needs_walk) { - struct include_data incdata; - struct bitmap_show_data show_data; - - if (base == NULL) - base = bitmap_new(); - - incdata.bitmap_git = bitmap_git; - incdata.base = base; - incdata.seen = seen; - - revs->include_check = should_include; - revs->include_check_data = &incdata; - - if (prepare_revision_walk(revs)) - die("revision walk setup failed"); + khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid); + if (pos < kh_end(bitmap_git->bitmaps)) { + struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); + struct ewah_bitmap *or_with = lookup_stored_bitmap(st); - show_data.bitmap_git = bitmap_git; - show_data.base = base; + if (*base == NULL) + *base = ewah_to_bitmap(or_with); + else + bitmap_or_ewah(*base, or_with); - traverse_commit_list(revs, show_commit, show_object, - &show_data); + return 1; } - - return base; + return 0; } static void show_extended_objects(struct bitmap_index *bitmap_git, @@ -665,24 +509,122 @@ static void show_objects_for_type( } } -static int in_bitmapped_pack(struct bitmap_index *bitmap_git, - struct object_list *roots) +static void add_object_to_bitmap(struct bitmap_index *bitmap_git, + struct object *obj, + struct bitmap *bitmap, + struct bitmap *seen, + int ignore_missing); + +static void add_tag_to_bitmap(struct bitmap_index *bitmap_git, + struct tag *tag, + struct bitmap *bitmap, + struct bitmap *seen, + int ignore_missing) +{ + if (parse_tag(tag) < 0 || !tag->tagged) { + if (ignore_missing) + return; + die("unable to parse tag %s", oid_to_hex(&tag->object.oid)); + } + add_object_to_bitmap(bitmap_git, tag->tagged, bitmap, seen, + ignore_missing); +} + +static void add_tree_to_bitmap(struct bitmap_index *bitmap_git, + struct tree *tree, + struct bitmap *bitmap, + struct bitmap *seen, + int ignore_missing) { - while (roots) { - struct object *object = roots->item; - roots = roots->next; + struct tree_desc desc; + struct name_entry entry; - if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0) - return 1; + if (parse_tree(tree) < 0) { + if (ignore_missing) + return; + die("unable to parse tree %s", oid_to_hex(&tree->object.oid)); } - return 0; + init_tree_desc(&desc, tree->buffer, tree->size); + while (tree_entry(&desc, &entry)) { + if (S_ISDIR(entry.mode)) { + struct tree *t = lookup_tree(the_repository, &entry.oid); + if (!t) { + die(_("entry '%s' in tree %s has tree mode, " + "but is not a tree"), + entry.path, oid_to_hex(&tree->object.oid)); + } + add_object_to_bitmap(bitmap_git, &t->object, + bitmap, seen, ignore_missing); + } else if (!S_ISGITLINK(entry.mode)) { + struct blob *b = lookup_blob(the_repository, &entry.oid); + if (!b) { + die(_("entry '%s' in tree %s has blob mode, " + "but is not a blob"), + entry.path, oid_to_hex(&tree->object.oid)); + } + add_object_to_bitmap(bitmap_git, &b->object, + bitmap, seen, ignore_missing); + } + } + + free_tree_buffer(tree); +} + +static void add_object_to_bitmap(struct bitmap_index *bitmap_git, + struct object *obj, + struct bitmap *bitmap, + struct bitmap *seen, + int ignore_missing) +{ + int pos = bitmap_position(bitmap_git, &obj->oid); + + if (pos < 0) + pos = ext_index_add_object(bitmap_git, obj, NULL); + + if (bitmap_get(bitmap, pos)) + return; /* already there */ + + if (seen && bitmap_get(seen, pos)) + return; /* seen in other bitmap; not worth digging further */ + + bitmap_set(bitmap, pos); + + if (obj->type == OBJ_BLOB) + return; + else if (obj->type == OBJ_TAG) + add_tag_to_bitmap(bitmap_git, (struct tag *)obj, + bitmap, seen, ignore_missing); + else if (obj->type == OBJ_TREE) + add_tree_to_bitmap(bitmap_git, (struct tree *)obj, + bitmap, seen, ignore_missing); + else + BUG("unexpected object type: %d", obj->type); +} + +static void add_objects_to_bitmap(struct bitmap_index *bitmap_git, + struct object_list **list, + struct bitmap *bitmap, + struct bitmap *seen, + int ignore_missing) +{ + while (*list) { + struct object_list *cur = *list; + + add_object_to_bitmap(bitmap_git, cur->item, + bitmap, seen, ignore_missing); + + *list = cur->next; + free(cur); + } } struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) { unsigned int i; + struct prio_queue commits = { compare_commits_by_commit_date }; + struct object_list *wants = NULL; struct object_list *haves = NULL; @@ -714,24 +656,16 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) object = parse_object_or_die(&tag->tagged->oid, NULL); } - if (object->flags & UNINTERESTING) - object_list_insert(object, &haves); - else - object_list_insert(object, &wants); + if (object->type == OBJ_COMMIT) + prio_queue_put(&commits, object); + else { + if (object->flags & UNINTERESTING) + object_list_insert(object, &haves); + else + object_list_insert(object, &wants); + } } - /* - * if we have a HAVES list, but none of those haves is contained - * in the packfile that has a bitmap, we don't have anything to - * optimize here - */ - if (haves && !in_bitmapped_pack(bitmap_git, haves)) - goto cleanup; - - /* if we don't want anything, we're done here */ - if (!wants) - goto cleanup; - /* * now we're going to use bitmaps, so load the actual bitmap entries * from disk. this is the point of no return; after this the rev_list @@ -742,20 +676,74 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) object_array_clear(&revs->pending); - if (haves) { - revs->ignore_missing_links = 1; - haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); - reset_revision_walk(); - revs->ignore_missing_links = 0; + haves_bitmap = bitmap_new(); + wants_bitmap = bitmap_new(); - if (haves_bitmap == NULL) - BUG("failed to perform bitmap walk"); - } + /* + * First traverse the relevant commits as we would for a normal + * traversal. + */ + while (commits.nr) { + struct commit *commit = prio_queue_get(&commits); + struct bitmap **dst_bitmap; + int pos; + struct commit_list *parent; + + if (commit->object.flags & UNINTERESTING) + dst_bitmap = &haves_bitmap; + else + dst_bitmap = &wants_bitmap; - wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); + pos = bitmap_position(bitmap_git, &commit->object.oid); + if (pos >= 0 && *dst_bitmap && bitmap_get(*dst_bitmap, pos)) + continue; /* already covered */ - if (!wants_bitmap) - BUG("failed to perform bitmap walk"); + if (add_commit_to_bitmap(bitmap_git, dst_bitmap, commit)) + continue; /* skip adding parents, they're covered */ + + + /* otherwise mark ourselves and queue dependent objects */ + if (pos < 0) + pos = ext_index_add_object(bitmap_git, &commit->object, NULL); + bitmap_set(*dst_bitmap, pos); + if (parse_commit(commit)) { + if (commit->object.flags & UNINTERESTING) + continue; + die("unable to parse commit %s", + oid_to_hex(&commit->object.oid)); + } + if (commit->object.flags & UNINTERESTING) { + /* + * If an uninteresting commit is in the "wants" bitmap, + * then our tree is worth exploring. This means we may + * miss some trees in the "haves" that are not + * ancestors of "wants" (or that are, but we missed + * because of out-of-order timestamps). + */ + if (wants_bitmap && bitmap_get(wants_bitmap, pos)) + object_list_insert(&get_commit_tree(commit)->object, + &haves); + } else { + /* + * "wants" must always be explored + */ + object_list_insert(&get_commit_tree(commit)->object, + &wants); + } + + for (parent = commit->parents; parent; parent = parent->next) { + if (commit->object.flags & UNINTERESTING) + parent->item->object.flags |= UNINTERESTING; + prio_queue_put(&commits, parent->item); + } + } + + /* + * At this point we've processed all of the commits, and queued items + * in "haves" and "wants" that need to be marked. + */ + add_objects_to_bitmap(bitmap_git, &haves, haves_bitmap, NULL, 1); + add_objects_to_bitmap(bitmap_git, &wants, wants_bitmap, haves_bitmap, 0); if (haves_bitmap) bitmap_and_not(wants_bitmap, haves_bitmap);
On 5/23/2019 7:30 AM, Jeff King wrote: > On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote: > >> 2. The answer we get from a bitmap versus a regular traversal are not >> apples-to-apples equivalent. The regular traversal walks down to >> the UNINTERESTING commits, marks the boundaries trees and blobs as >> UNINTERESTING, and then adds in all the interesting trees and blobs >> minus the UNINTERESTING parts. So it can sometimes give the wrong >> answer, claiming something is interesting when it is not. >> >> Whereas with bitmaps we fill in the trees and blobs as we walk, and >> you get the true answer. But it means we may open up a lot more >> trees than the equivalent traversal would. >> >> So one thing I've never really experimented with (and indeed, never >> really thought about until writing this email) is that the bitmaps >> could try to do that looser style of traversal, knowing that we >> might err on the side of calling things interesting in a few cases. >> But hopefully spending a lot less time opening trees. >> >> I'm not even 100% sure what that would look like in code, but just >> thinking about it from a high level, I don't there's a particular >> mathematical reason it couldn't work. > > I spent a while thinking and experimenting with this tonight. The result > is the patch below. Ævar, do you still have a copy of the repo that > misbehaved? I'd be curious to see how it fares. This patch caught my attention, and I think I understand some of the issues at hand. I'm not well-versed specifically in Git's bitmap implementation. The fundamental ideas are there, but my perspective is biased by my own independent bitmap implementation for Azure Repos. What worked there may not work at all here. > Finding the right trees to explore is a little tricky with bitmaps. In > a normal traversal, we consider the "edges" to be worth exploring. > I.e., the places where an UNINTERESTING commit is the parent of an > interesting one. This is the "commit frontier" which I bring up again below. > But with bitmaps, we don't have that information in the same way, > because we're trying to avoid walking the commits in the first place. So > e.g., in a history like this: > > A--B--C > \ > D > > Let's imagine we're computing "C..D", and that D has a bitmap on disk > but C does not. (As I read your discussion below, I'm confused. For "C..D", C is a have and D is a want. We should explore all the haves _first_, then walk the wants, right?) > When we visit D, we'll see that it has a bitmap, fill in > the results in our cumulative "want" bitmap, and then stop walking its > parents (because we know everything they could tell us already). I may be misreading something, but we would construct _a_ bitmap for C by walking its reachable objects until hitting a bitmap we know about. Perhaps A or B have a bitmap to start the construction. If we never construct a bitmap for C, then we don't know what to remove from the "D" bitmap to show the difference. If "C" is not even in the bitmap pack, then we don't use bitmaps for this calculation because of this condition: /* * if we have a HAVES list, but none of those haves is contained * in the packfile that has a bitmap, we don't have anything to * optimize here */ if (haves && !in_bitmapped_pack(bitmap_git, haves)) goto cleanup; > Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal > traversal, we can't stop walking even though there are only > UNINTERESTING commits left, because we have to fill the complete bitmap, > including A and B (and you can imagine there might be thousands of > ancestors of A, too). The reason is that we skipped adding ancestors of > D when we saw its bitmap, so no still_interesting() cutoff will work > there. > > But how do we know that it's worth looking into the tree of "B"? If we > assume we visit commits before their ancestors (because of the commit > timestamp ordering), then we'd see that "B" is in the "want" bitmap > already, which makes it worth visiting its tree. > > Unfortunately we'd then continue on to "A", and it would _also_ look > interesting, because it's also in the "want" bitmap. We don't have an > easy way of knowing that "A" is basically already covered by "B", and is > therefore not worth pursuing. In this example, we could pass down a bit > from B to A as we traverse. But you can imagine another interesting > commit branched from A, which _would_ make A's tree worth considering. > > Fundamentally, as soon as we load a bitmap and stop traversing, we lose > all information about the graph structure. > > So the patch below just looks at every tree that might be worth > exploring (so both "A" and "B" here, but not "C"). That shouldn't be any > _worse_ than what the current code does (because it looks at all the > trees). It appears to behave correctly, at least so far as passing the > test suite. Running p5310 and p5311 against git.git and linux.git, it > seems to perform worse. I'm not quite sure why that is (i.e., if I > screwed up something in the implementation, or if the algorithm is > fundamentally worse). I'm of the opinion that the old method was better. It followed a very clear three-step process: 1. Compute the "haves" bitmap. 2. Compute the "wants" bitmap, but don't explore into the "haves" bitmap. 3. Subtract the "haves" bitmap from the "wants" (in case we added bits to the wants before they were in the haves). But there is hope in your idea to improve some cases. As noted, we give up if all of the haves are not in the bitmapped pack. By starting with a commit walk, we can find the "commit frontier," which is the set of commits that are explored by paint_down_to_common(). In this case, the set {B, C, D}. After walking commits and finding a set of UNINTERESTING commits, we could look for any UNINTERESTING commits that have bitmaps (or simply are in the bitmapped pack) and use those to see our "haves" bitmap. So, if B has a bitmap, but C is too new for the bitmap, then we can still create the haves based on B and then take a bitmap diff "D minus B". In fact, using the "merge base set" for the diff reflects what the non-bitmap algorithm does in this case. It ignores C and only explores B. I learned a lot looking at both versions of this method side-by-side. Thanks! -Stolee
On Thu, May 23 2019, Jeff King wrote: > On Wed, Apr 10, 2019 at 06:57:21PM -0400, Jeff King wrote: > >> 2. The answer we get from a bitmap versus a regular traversal are not >> apples-to-apples equivalent. The regular traversal walks down to >> the UNINTERESTING commits, marks the boundaries trees and blobs as >> UNINTERESTING, and then adds in all the interesting trees and blobs >> minus the UNINTERESTING parts. So it can sometimes give the wrong >> answer, claiming something is interesting when it is not. >> >> Whereas with bitmaps we fill in the trees and blobs as we walk, and >> you get the true answer. But it means we may open up a lot more >> trees than the equivalent traversal would. >> >> So one thing I've never really experimented with (and indeed, never >> really thought about until writing this email) is that the bitmaps >> could try to do that looser style of traversal, knowing that we >> might err on the side of calling things interesting in a few cases. >> But hopefully spending a lot less time opening trees. >> >> I'm not even 100% sure what that would look like in code, but just >> thinking about it from a high level, I don't there's a particular >> mathematical reason it couldn't work. > > I spent a while thinking and experimenting with this tonight. The result > is the patch below. Ævar, do you still have a copy of the repo that > misbehaved? I'd be curious to see how it fares. No, sorry. I think we could make an artificial test to emulate it, which would be something like: * ~1 million commits * local clone setup to fetch all branches/tags (default 'git clone') * local a bit ahead/behind * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that * push try to push the local change upstream (or to a topic branch) I tried briefly to emulate this with git.git with: ( rm -rf /tmp/git /tmp/git.old && git init /tmp/git.old && git clone --bare https://github.com/git/git.git /tmp/git && cd /tmp/git && old_commit=$(git rev-parse v2.20.0^{}) && git rev-list v2.12.0..v2.21.0|parallel 'git branch topic-{} {}' && cd /tmp/git.old && echo /tmp/git/objects >.git/objects/info/alternates && git branch master $old_commit && git reset --hard master && git repack -Adb && rm .git/objects/info/alternates && for c in {1..10} do >$c && git add $c && git commit -m"bump $c" done ) But didn't get anywhere, probably because here my topics are all stuff I have already, and I just have 2x packs. It would be really cool to have some test tool that could produce various "shape of repo" states like that. I.e. given an end-state of a public repo emulate various plausible local client states like that given some assumptions about how often the local client fetches, what the GC settings are etc. > Finding the right trees to explore is a little tricky with bitmaps. In > a normal traversal, we consider the "edges" to be worth exploring. > I.e., the places where an UNINTERESTING commit is the parent of an > interesting one. > > But with bitmaps, we don't have that information in the same way, > because we're trying to avoid walking the commits in the first place. So > e.g., in a history like this: > > A--B--C > \ > D > > Let's imagine we're computing "C..D", and that D has a bitmap on disk > but C does not. When we visit D, we'll see that it has a bitmap, fill in > the results in our cumulative "want" bitmap, and then stop walking its > parents (because we know everything they could tell us already). > > Then we visit C. It's not bitmapped, so we keep walking. Unlike a normal > traversal, we can't stop walking even though there are only > UNINTERESTING commits left, because we have to fill the complete bitmap, > including A and B (and you can imagine there might be thousands of > ancestors of A, too). The reason is that we skipped adding ancestors of > D when we saw its bitmap, so no still_interesting() cutoff will work > there. > > But how do we know that it's worth looking into the tree of "B"? If we > assume we visit commits before their ancestors (because of the commit > timestamp ordering), then we'd see that "B" is in the "want" bitmap > already, which makes it worth visiting its tree. > > Unfortunately we'd then continue on to "A", and it would _also_ look > interesting, because it's also in the "want" bitmap. We don't have an > easy way of knowing that "A" is basically already covered by "B", and is > therefore not worth pursuing. In this example, we could pass down a bit > from B to A as we traverse. But you can imagine another interesting > commit branched from A, which _would_ make A's tree worth considering. > > Fundamentally, as soon as we load a bitmap and stop traversing, we lose > all information about the graph structure. > > So the patch below just looks at every tree that might be worth > exploring (so both "A" and "B" here, but not "C"). That shouldn't be any > _worse_ than what the current code does (because it looks at all the > trees). It appears to behave correctly, at least so far as passing the > test suite. Running p5310 and p5311 against git.git and linux.git, it > seems to perform worse. I'm not quite sure why that is (i.e., if I > screwed up something in the implementation, or if the algorithm is > fundamentally worse). > > There are a lot of rough edges in the patch; I was just trying to see if > the idea was even viable. So don't bother reviewing it as a real > proposal for inclusion. If you do read it, I'd recommend just looking at > the post-image, since it's essentially a rewrite and the diff is a bit > messy. > > -Peff > > --- > pack-bitmap.c | 398 ++++++++++++++++++++++++-------------------------- > 1 file changed, 193 insertions(+), 205 deletions(-) > > diff --git a/pack-bitmap.c b/pack-bitmap.c > index 6069b2fe55..4a40f62a38 100644 > --- a/pack-bitmap.c > +++ b/pack-bitmap.c > @@ -12,6 +12,8 @@ > #include "packfile.h" > #include "repository.h" > #include "object-store.h" > +#include "blob.h" > +#include "prio-queue.h" > > /* > * An entry on the bitmap index, representing the bitmap for a given > @@ -422,180 +424,22 @@ static int ext_index_add_object(struct bitmap_index *bitmap_git, > return bitmap_pos + bitmap_git->pack->num_objects; > } > > -struct bitmap_show_data { > - struct bitmap_index *bitmap_git; > - struct bitmap *base; > -}; > - > -static void show_object(struct object *object, const char *name, void *data_) > -{ > - struct bitmap_show_data *data = data_; > - int bitmap_pos; > - > - bitmap_pos = bitmap_position(data->bitmap_git, &object->oid); > - > - if (bitmap_pos < 0) > - bitmap_pos = ext_index_add_object(data->bitmap_git, object, > - name); > - > - bitmap_set(data->base, bitmap_pos); > -} > - > -static void show_commit(struct commit *commit, void *data) > -{ > -} > - > -static int add_to_include_set(struct bitmap_index *bitmap_git, > - struct include_data *data, > - const struct object_id *oid, > - int bitmap_pos) > -{ > - khiter_t hash_pos; > - > - if (data->seen && bitmap_get(data->seen, bitmap_pos)) > - return 0; > - > - if (bitmap_get(data->base, bitmap_pos)) > - return 0; > - > - hash_pos = kh_get_oid_map(bitmap_git->bitmaps, *oid); > - if (hash_pos < kh_end(bitmap_git->bitmaps)) { > - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, hash_pos); > - bitmap_or_ewah(data->base, lookup_stored_bitmap(st)); > - return 0; > - } > - > - bitmap_set(data->base, bitmap_pos); > - return 1; > -} > - > -static int should_include(struct commit *commit, void *_data) > -{ > - struct include_data *data = _data; > - int bitmap_pos; > - > - bitmap_pos = bitmap_position(data->bitmap_git, &commit->object.oid); > - if (bitmap_pos < 0) > - bitmap_pos = ext_index_add_object(data->bitmap_git, > - (struct object *)commit, > - NULL); > - > - if (!add_to_include_set(data->bitmap_git, data, &commit->object.oid, > - bitmap_pos)) { > - struct commit_list *parent = commit->parents; > - > - while (parent) { > - parent->item->object.flags |= SEEN; > - parent = parent->next; > - } > - > - return 0; > - } > - > - return 1; > -} > - > -static struct bitmap *find_objects(struct bitmap_index *bitmap_git, > - struct rev_info *revs, > - struct object_list *roots, > - struct bitmap *seen) > +static int add_commit_to_bitmap(struct bitmap_index *bitmap_git, > + struct bitmap **base, struct commit *commit) > { > - struct bitmap *base = NULL; > - int needs_walk = 0; > - > - struct object_list *not_mapped = NULL; > - > - /* > - * Go through all the roots for the walk. The ones that have bitmaps > - * on the bitmap index will be `or`ed together to form an initial > - * global reachability analysis. > - * > - * The ones without bitmaps in the index will be stored in the > - * `not_mapped_list` for further processing. > - */ > - while (roots) { > - struct object *object = roots->item; > - roots = roots->next; > - > - if (object->type == OBJ_COMMIT) { > - khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, object->oid); > - > - if (pos < kh_end(bitmap_git->bitmaps)) { > - struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); > - struct ewah_bitmap *or_with = lookup_stored_bitmap(st); > - > - if (base == NULL) > - base = ewah_to_bitmap(or_with); > - else > - bitmap_or_ewah(base, or_with); > - > - object->flags |= SEEN; > - continue; > - } > - } > - > - object_list_insert(object, ¬_mapped); > - } > - > - /* > - * Best case scenario: We found bitmaps for all the roots, > - * so the resulting `or` bitmap has the full reachability analysis > - */ > - if (not_mapped == NULL) > - return base; > - > - roots = not_mapped; > - > - /* > - * Let's iterate through all the roots that don't have bitmaps to > - * check if we can determine them to be reachable from the existing > - * global bitmap. > - * > - * If we cannot find them in the existing global bitmap, we'll need > - * to push them to an actual walk and run it until we can confirm > - * they are reachable > - */ > - while (roots) { > - struct object *object = roots->item; > - int pos; > - > - roots = roots->next; > - pos = bitmap_position(bitmap_git, &object->oid); > - > - if (pos < 0 || base == NULL || !bitmap_get(base, pos)) { > - object->flags &= ~UNINTERESTING; > - add_pending_object(revs, object, ""); > - needs_walk = 1; > - } else { > - object->flags |= SEEN; > - } > - } > - > - if (needs_walk) { > - struct include_data incdata; > - struct bitmap_show_data show_data; > - > - if (base == NULL) > - base = bitmap_new(); > - > - incdata.bitmap_git = bitmap_git; > - incdata.base = base; > - incdata.seen = seen; > - > - revs->include_check = should_include; > - revs->include_check_data = &incdata; > - > - if (prepare_revision_walk(revs)) > - die("revision walk setup failed"); > + khiter_t pos = kh_get_oid_map(bitmap_git->bitmaps, commit->object.oid); > + if (pos < kh_end(bitmap_git->bitmaps)) { > + struct stored_bitmap *st = kh_value(bitmap_git->bitmaps, pos); > + struct ewah_bitmap *or_with = lookup_stored_bitmap(st); > > - show_data.bitmap_git = bitmap_git; > - show_data.base = base; > + if (*base == NULL) > + *base = ewah_to_bitmap(or_with); > + else > + bitmap_or_ewah(*base, or_with); > > - traverse_commit_list(revs, show_commit, show_object, > - &show_data); > + return 1; > } > - > - return base; > + return 0; > } > > static void show_extended_objects(struct bitmap_index *bitmap_git, > @@ -665,24 +509,122 @@ static void show_objects_for_type( > } > } > > -static int in_bitmapped_pack(struct bitmap_index *bitmap_git, > - struct object_list *roots) > +static void add_object_to_bitmap(struct bitmap_index *bitmap_git, > + struct object *obj, > + struct bitmap *bitmap, > + struct bitmap *seen, > + int ignore_missing); > + > +static void add_tag_to_bitmap(struct bitmap_index *bitmap_git, > + struct tag *tag, > + struct bitmap *bitmap, > + struct bitmap *seen, > + int ignore_missing) > +{ > + if (parse_tag(tag) < 0 || !tag->tagged) { > + if (ignore_missing) > + return; > + die("unable to parse tag %s", oid_to_hex(&tag->object.oid)); > + } > + add_object_to_bitmap(bitmap_git, tag->tagged, bitmap, seen, > + ignore_missing); > +} > + > +static void add_tree_to_bitmap(struct bitmap_index *bitmap_git, > + struct tree *tree, > + struct bitmap *bitmap, > + struct bitmap *seen, > + int ignore_missing) > { > - while (roots) { > - struct object *object = roots->item; > - roots = roots->next; > + struct tree_desc desc; > + struct name_entry entry; > > - if (find_pack_entry_one(object->oid.hash, bitmap_git->pack) > 0) > - return 1; > + if (parse_tree(tree) < 0) { > + if (ignore_missing) > + return; > + die("unable to parse tree %s", oid_to_hex(&tree->object.oid)); > } > > - return 0; > + init_tree_desc(&desc, tree->buffer, tree->size); > + while (tree_entry(&desc, &entry)) { > + if (S_ISDIR(entry.mode)) { > + struct tree *t = lookup_tree(the_repository, &entry.oid); > + if (!t) { > + die(_("entry '%s' in tree %s has tree mode, " > + "but is not a tree"), > + entry.path, oid_to_hex(&tree->object.oid)); > + } > + add_object_to_bitmap(bitmap_git, &t->object, > + bitmap, seen, ignore_missing); > + } else if (!S_ISGITLINK(entry.mode)) { > + struct blob *b = lookup_blob(the_repository, &entry.oid); > + if (!b) { > + die(_("entry '%s' in tree %s has blob mode, " > + "but is not a blob"), > + entry.path, oid_to_hex(&tree->object.oid)); > + } > + add_object_to_bitmap(bitmap_git, &b->object, > + bitmap, seen, ignore_missing); > + } > + } > + > + free_tree_buffer(tree); > +} > + > +static void add_object_to_bitmap(struct bitmap_index *bitmap_git, > + struct object *obj, > + struct bitmap *bitmap, > + struct bitmap *seen, > + int ignore_missing) > +{ > + int pos = bitmap_position(bitmap_git, &obj->oid); > + > + if (pos < 0) > + pos = ext_index_add_object(bitmap_git, obj, NULL); > + > + if (bitmap_get(bitmap, pos)) > + return; /* already there */ > + > + if (seen && bitmap_get(seen, pos)) > + return; /* seen in other bitmap; not worth digging further */ > + > + bitmap_set(bitmap, pos); > + > + if (obj->type == OBJ_BLOB) > + return; > + else if (obj->type == OBJ_TAG) > + add_tag_to_bitmap(bitmap_git, (struct tag *)obj, > + bitmap, seen, ignore_missing); > + else if (obj->type == OBJ_TREE) > + add_tree_to_bitmap(bitmap_git, (struct tree *)obj, > + bitmap, seen, ignore_missing); > + else > + BUG("unexpected object type: %d", obj->type); > +} > + > +static void add_objects_to_bitmap(struct bitmap_index *bitmap_git, > + struct object_list **list, > + struct bitmap *bitmap, > + struct bitmap *seen, > + int ignore_missing) > +{ > + while (*list) { > + struct object_list *cur = *list; > + > + add_object_to_bitmap(bitmap_git, cur->item, > + bitmap, seen, ignore_missing); > + > + *list = cur->next; > + free(cur); > + } > } > > struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) > { > unsigned int i; > > + struct prio_queue commits = { compare_commits_by_commit_date }; > + > struct object_list *wants = NULL; > struct object_list *haves = NULL; > > @@ -714,24 +656,16 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) > object = parse_object_or_die(&tag->tagged->oid, NULL); > } > > - if (object->flags & UNINTERESTING) > - object_list_insert(object, &haves); > - else > - object_list_insert(object, &wants); > + if (object->type == OBJ_COMMIT) > + prio_queue_put(&commits, object); > + else { > + if (object->flags & UNINTERESTING) > + object_list_insert(object, &haves); > + else > + object_list_insert(object, &wants); > + } > } > > - /* > - * if we have a HAVES list, but none of those haves is contained > - * in the packfile that has a bitmap, we don't have anything to > - * optimize here > - */ > - if (haves && !in_bitmapped_pack(bitmap_git, haves)) > - goto cleanup; > - > - /* if we don't want anything, we're done here */ > - if (!wants) > - goto cleanup; > - > /* > * now we're going to use bitmaps, so load the actual bitmap entries > * from disk. this is the point of no return; after this the rev_list > @@ -742,20 +676,74 @@ struct bitmap_index *prepare_bitmap_walk(struct rev_info *revs) > > object_array_clear(&revs->pending); > > - if (haves) { > - revs->ignore_missing_links = 1; > - haves_bitmap = find_objects(bitmap_git, revs, haves, NULL); > - reset_revision_walk(); > - revs->ignore_missing_links = 0; > + haves_bitmap = bitmap_new(); > + wants_bitmap = bitmap_new(); > > - if (haves_bitmap == NULL) > - BUG("failed to perform bitmap walk"); > - } > + /* > + * First traverse the relevant commits as we would for a normal > + * traversal. > + */ > + while (commits.nr) { > + struct commit *commit = prio_queue_get(&commits); > + struct bitmap **dst_bitmap; > + int pos; > + struct commit_list *parent; > + > + if (commit->object.flags & UNINTERESTING) > + dst_bitmap = &haves_bitmap; > + else > + dst_bitmap = &wants_bitmap; > > - wants_bitmap = find_objects(bitmap_git, revs, wants, haves_bitmap); > + pos = bitmap_position(bitmap_git, &commit->object.oid); > + if (pos >= 0 && *dst_bitmap && bitmap_get(*dst_bitmap, pos)) > + continue; /* already covered */ > > - if (!wants_bitmap) > - BUG("failed to perform bitmap walk"); > + if (add_commit_to_bitmap(bitmap_git, dst_bitmap, commit)) > + continue; /* skip adding parents, they're covered */ > + > + > + /* otherwise mark ourselves and queue dependent objects */ > + if (pos < 0) > + pos = ext_index_add_object(bitmap_git, &commit->object, NULL); > + bitmap_set(*dst_bitmap, pos); > + if (parse_commit(commit)) { > + if (commit->object.flags & UNINTERESTING) > + continue; > + die("unable to parse commit %s", > + oid_to_hex(&commit->object.oid)); > + } > + if (commit->object.flags & UNINTERESTING) { > + /* > + * If an uninteresting commit is in the "wants" bitmap, > + * then our tree is worth exploring. This means we may > + * miss some trees in the "haves" that are not > + * ancestors of "wants" (or that are, but we missed > + * because of out-of-order timestamps). > + */ > + if (wants_bitmap && bitmap_get(wants_bitmap, pos)) > + object_list_insert(&get_commit_tree(commit)->object, > + &haves); > + } else { > + /* > + * "wants" must always be explored > + */ > + object_list_insert(&get_commit_tree(commit)->object, > + &wants); > + } > + > + for (parent = commit->parents; parent; parent = parent->next) { > + if (commit->object.flags & UNINTERESTING) > + parent->item->object.flags |= UNINTERESTING; > + prio_queue_put(&commits, parent->item); > + } > + } > + > + /* > + * At this point we've processed all of the commits, and queued items > + * in "haves" and "wants" that need to be marked. > + */ > + add_objects_to_bitmap(bitmap_git, &haves, haves_bitmap, NULL, 1); > + add_objects_to_bitmap(bitmap_git, &wants, wants_bitmap, haves_bitmap, 0); > > if (haves_bitmap) > bitmap_and_not(wants_bitmap, haves_bitmap);
On Thu, May 23, 2019 at 08:53:56AM -0400, Derrick Stolee wrote: > > I spent a while thinking and experimenting with this tonight. The result > > is the patch below. Ævar, do you still have a copy of the repo that > > misbehaved? I'd be curious to see how it fares. > > This patch caught my attention, and I think I understand some of the issues > at hand. I'm not well-versed specifically in Git's bitmap implementation. > The fundamental ideas are there, but my perspective is biased by my own > independent bitmap implementation for Azure Repos. What worked there may not > work at all here. Thanks for looking at this. There are a lot of short-comings in the current bitmap implementation, so if there's a better way to do things, I'm not opposed to moving to a new format. :) > > Finding the right trees to explore is a little tricky with bitmaps. In > > a normal traversal, we consider the "edges" to be worth exploring. > > I.e., the places where an UNINTERESTING commit is the parent of an > > interesting one. > > This is the "commit frontier" which I bring up again below. Right. I actually had trouble coming up with a succinct way of describing this, and stole the definition from your recent blog post. ;) > > But with bitmaps, we don't have that information in the same way, > > because we're trying to avoid walking the commits in the first place. So > > e.g., in a history like this: > > > > A--B--C > > \ > > D > > > > Let's imagine we're computing "C..D", and that D has a bitmap on disk > > but C does not. > > (As I read your discussion below, I'm confused. For "C..D", C is a have > and D is a want. We should explore all the haves _first_, then walk the > wants, right?) I think I may have confused things by starting my description with a hypothetical combined want/have walk. To take a step back: the problem we were discussing is that we spend a lot of time reading trees to fill in the "have" bitmap, even though most of those objects are unlikely to be in the "want" in the first place (only the frontier trees are really useful). The current code does indeed fill the "have" bitmap first (so that while walking "want", it can avoid walking down paths whose bits we know we're going to clear eventually anyway). But we can't know the frontier if we completely fill the "have" bitmap first. We have to walk both sides together, looking for the frontier. > > When we visit D, we'll see that it has a bitmap, fill in > > the results in our cumulative "want" bitmap, and then stop walking its > > parents (because we know everything they could tell us already). > > I may be misreading something, but we would construct _a_ bitmap for C > by walking its reachable objects until hitting a bitmap we know about. > Perhaps A or B have a bitmap to start the construction. If we never > construct a bitmap for C, then we don't know what to remove from the "D" > bitmap to show the difference. Yes, and that's how the current code works. If we walk back to B and it has a bitmap, we can stop walking there and just OR in its bitmap, and look at the trees for any intermediate commits (in this case just C) to fill in the rest. The problem is that we don't have a bitmap for every commit. So you can imagine a history like this: A_1 -- A_2 -- ... -- A_1000 -- B -- C \ D where we have a bitmap _only_ for D. Filling in the accurate and true bitmap for C requires walking a thousand commits (and their trees!), when the non-bitmap algorithm would find the frontier at B and call that good enough. > If "C" is not even in the bitmap pack, then we don't use bitmaps for > this calculation because of this condition: > > /* > * if we have a HAVES list, but none of those haves is contained > * in the packfile that has a bitmap, we don't have anything to > * optimize here > */ > if (haves && !in_bitmapped_pack(bitmap_git, haves)) > goto cleanup; Right, but it may be in the pack but without a bitmap. We don't store a bitmap for every commit. The idea was to save space, but select some key commits that let us generally find a bitmap with a small amount of walking. And most of the time it works fine. But in some cases, we seem to find pathological cases where we do quite a lot of walking. As I said earlier in the thread, I suspect our commit selection is not great. It's mostly some heuristics we threw together in 2013, and I don't think it was tested very thoroughly. So fixing that may be a simpler approach. What I was wondering here was whether we could get an easy fix based on the same frontier ideas that the non-bitmap walk uses. > I'm of the opinion that the old method was better. It followed a very clear > three-step process: > > 1. Compute the "haves" bitmap. > > 2. Compute the "wants" bitmap, but don't explore into the "haves" bitmap. > > 3. Subtract the "haves" bitmap from the "wants" (in case we added bits to > the wants before they were in the haves). Right, this is _way_ simpler. But it necessarily means spending effort to find the complete "haves", because we don't know which parts are useful. > But there is hope in your idea to improve some cases. As noted, we give up > if all of the haves are not in the bitmapped pack. By starting with a > commit walk, we can find the "commit frontier," which is the set of commits > that are explored by paint_down_to_common(). In this case, the set {B, C, D}. > > After walking commits and finding a set of UNINTERESTING commits, we could > look for any UNINTERESTING commits that have bitmaps (or simply are in the > bitmapped pack) and use those to see our "haves" bitmap. So, if B has a > bitmap, but C is too new for the bitmap, then we can still create the haves > based on B and then take a bitmap diff "D minus B". But doing that commit walk to find the frontier negates part of the purpose of using the bitmaps, which is avoiding even walking commits. Going back to a variant of our example: A -- B -- C_1 -- .. -- C_1000 \ D_1 -- .. -- D_1000 If we have a bitmap at C_1000 and D_1000, we don't have to walk any commits at all. But finding the frontier requires walking 2000 commits. Is there a way to find it with just bitmaps? You can certainly find the intersection, but you don't have any idea which of the many shared commits is the merge base. Of course in this example you don't actually care about the frontier (you have the full answer immediately). But how do you decide which way to optimize: try to avoid walking commits to get a quick answer from bitmaps, or prefer to walk some commits to find the frontier and avoid opening too many trees? -Peff
On Thu, May 23, 2019 at 09:26:34PM +0200, Ævar Arnfjörð Bjarmason wrote: > > I spent a while thinking and experimenting with this tonight. The result > > is the patch below. Ævar, do you still have a copy of the repo that > > misbehaved? I'd be curious to see how it fares. > > No, sorry. I think we could make an artificial test to emulate it, which > would be something like: > > * ~1 million commits > * local clone setup to fetch all branches/tags (default 'git clone') > * local a bit ahead/behind > * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that > * push try to push the local change upstream (or to a topic branch) > > I tried briefly to emulate this with git.git with: > [...] > But didn't get anywhere, probably because here my topics are all stuff I > have already, and I just have 2x packs. Yeah, I haven't been able to find a reproduction for this problem at will. The bitmaps are _supposed_ to be sprinkled around through the commit graph so that we don't have to walk far. But presumably in your case that was not so. I'm not sure what tickles the bitmap-writer to fail so hard. Is it having too many refs? Weird patterns in the graph? Just a ton of commits? -Peff
On Fri, May 24 2019, Jeff King wrote: > On Thu, May 23, 2019 at 09:26:34PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > I spent a while thinking and experimenting with this tonight. The result >> > is the patch below. Ævar, do you still have a copy of the repo that >> > misbehaved? I'd be curious to see how it fares. >> >> No, sorry. I think we could make an artificial test to emulate it, which >> would be something like: >> >> * ~1 million commits >> * local clone setup to fetch all branches/tags (default 'git clone') >> * local a bit ahead/behind >> * Have old "main" pack with *.bitmap, bunch of other packs/loose objects without that >> * push try to push the local change upstream (or to a topic branch) >> >> I tried briefly to emulate this with git.git with: >> [...] >> But didn't get anywhere, probably because here my topics are all stuff I >> have already, and I just have 2x packs. > > Yeah, I haven't been able to find a reproduction for this problem at > will. The bitmaps are _supposed_ to be sprinkled around through the > commit graph so that we don't have to walk far. But presumably in your > case that was not so. > > I'm not sure what tickles the bitmap-writer to fail so hard. Is it > having too many refs? Weird patterns in the graph? Just a ton of > commits? Ah, why did only this ancient (big) pack have a bitmap? The bitmap writer had never failed, this was just a repository where some automation (on a dev/staging box) cloned a repo, and someone had once run a manual "repack" to make make a pack with a bitmap. Then as time passed that pack stayed around, and re-looking at this that could have only happened because I had gc.bigPackThreshold turned on. I.e. without that we'd have eventually done a full repack, so the bitmap would have gone away. So getting the repo into that state was a series of unlikely events. I think to the extent that this is an issue we can reproduce in the future the proper fix for it in lieu of some easy fix in the bitmap code would be to just teach "gc" to unlink old *.bitmap files if we detect they're too stale. I.e. we don't need to deal gracefully with some case where the bitmaps just cover some tiny part of the graph, we can just teach "gc" to either update them, or (if we're not currently writing them) unlink them. That seems to me to be a good idea in general, not just with bitmaps but also the commit graph. If we're doing a GC and our current settings aren't such that we'd update those files, shouldn't we just unlink them?
On Fri, May 24, 2019 at 09:55:05AM +0200, Ævar Arnfjörð Bjarmason wrote: > > I'm not sure what tickles the bitmap-writer to fail so hard. Is it > > having too many refs? Weird patterns in the graph? Just a ton of > > commits? > > Ah, why did only this ancient (big) pack have a bitmap? > > The bitmap writer had never failed, this was just a repository where > some automation (on a dev/staging box) cloned a repo, and someone had > once run a manual "repack" to make make a pack with a bitmap. Just to be clear, by "fail" I didn't mean that the writer failed to produce. I just meant it had poor commit selection for its bitmap coverage. But... > Then as time passed that pack stayed around, and re-looking at this that > could have only happened because I had gc.bigPackThreshold turned on. > > I.e. without that we'd have eventually done a full repack, so the bitmap > would have gone away. > > So getting the repo into that state was a series of unlikely events. Ah, now that's interesting. The issue may have just been that we had a ton of un-bitmapped commits because they weren't in the bitmapped pack at all. The logic that Stolee pointed out earlier: /* * if we have a HAVES list, but none of those haves is contained * in the packfile that has a bitmap, we don't have anything to * optimize here */ if (haves && !in_bitmapped_pack(bitmap_git, haves)) goto cleanup; is pretty feeble. If you have even _one_ UNINTERESTING tip that's in the pack, we'll try to use bitmaps. And in your case, you almost certainly had old tags on both the client and the server there were in the old bitmapped pack, but then a huge swath of history that had happened since then. And it was that newer part of the graph that we had to walk to fill in the bitmap. So all of this makes pretty good sense to me now. Bitmaps work incrementally as you add new, un-bitmapped history. But the cost gets higher and higher the longer you go before repacking and generating new bitmaps. So your case was very similar to a repo that uses bitmaps but just hadn't packed in a really long time. > I think to the extent that this is an issue we can reproduce in the > future the proper fix for it in lieu of some easy fix in the bitmap code > would be to just teach "gc" to unlink old *.bitmap files if we detect > they're too stale. Yeah. This could happen if you simply accumulated history without ever running "gc". But in general we can probably assume that "gc" will run periodically (though there is a real blind spot if you push up a very huge chunk of history in one pack, since gc counts packs, not objects). I agree that if gc is deciding to leave a big pack, it should probably ditch the bitmaps for it. > I.e. we don't need to deal gracefully with some case where the bitmaps > just cover some tiny part of the graph, we can just teach "gc" to either > update them, or (if we're not currently writing them) unlink them. Unfortunately I don't think we can update them, because all of the reachable objects need to be in the same pack. So any scheme that isn't doing a periodic all-into-one repack probably shouldn't be using bitmaps. I wonder if we need to tweak Eric's bitmaps-by-default logic to better account for this. > That seems to me to be a good idea in general, not just with bitmaps but > also the commit graph. If we're doing a GC and our current settings > aren't such that we'd update those files, shouldn't we just unlink them? I don't think the commit graph would suffer from this. It's not tied to a specific pack, so it can be freely updated on any gc. You still have the problem when gc does not run. It's possible that auto-gc should have separate thresholds for different activities (e.g., number of packs should tell us when to repack objects, number of loose refs should tell us when to pack refs, number of un-annotated commits should tell us when to update the commit graph). The trouble is that some of those checks are non-trivial. In the long run, I think the plan is for the commit graph to allow cheap incremental updating, which may make it reasonable to just update it preemptively after every fetch/push. -Peff
On Fri, May 24 2019, Jeff King wrote: > On Fri, May 24, 2019 at 09:55:05AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > I'm not sure what tickles the bitmap-writer to fail so hard. Is it >> > having too many refs? Weird patterns in the graph? Just a ton of >> > commits? >> >> Ah, why did only this ancient (big) pack have a bitmap? >> >> The bitmap writer had never failed, this was just a repository where >> some automation (on a dev/staging box) cloned a repo, and someone had >> once run a manual "repack" to make make a pack with a bitmap. > > Just to be clear, by "fail" I didn't mean that the writer failed to > produce. I just meant it had poor commit selection for its bitmap > coverage. But... > >> Then as time passed that pack stayed around, and re-looking at this that >> could have only happened because I had gc.bigPackThreshold turned on. >> >> I.e. without that we'd have eventually done a full repack, so the bitmap >> would have gone away. >> >> So getting the repo into that state was a series of unlikely events. > > Ah, now that's interesting. The issue may have just been that we had a > ton of un-bitmapped commits because they weren't in the bitmapped pack > at all. The logic that Stolee pointed out earlier: > > /* > * if we have a HAVES list, but none of those haves is contained > * in the packfile that has a bitmap, we don't have anything to > * optimize here > */ > if (haves && !in_bitmapped_pack(bitmap_git, haves)) > goto cleanup; > > is pretty feeble. If you have even _one_ UNINTERESTING tip that's in the > pack, we'll try to use bitmaps. And in your case, you almost certainly > had old tags on both the client and the server there were in the old > bitmapped pack, but then a huge swath of history that had happened since > then. And it was that newer part of the graph that we had to walk to > fill in the bitmap. > > So all of this makes pretty good sense to me now. Bitmaps work > incrementally as you add new, un-bitmapped history. But the cost gets > higher and higher the longer you go before repacking and generating new > bitmaps. So your case was very similar to a repo that uses bitmaps but > just hadn't packed in a really long time. > >> I think to the extent that this is an issue we can reproduce in the >> future the proper fix for it in lieu of some easy fix in the bitmap code >> would be to just teach "gc" to unlink old *.bitmap files if we detect >> they're too stale. > > Yeah. This could happen if you simply accumulated history without ever > running "gc". But in general we can probably assume that "gc" will run > periodically (though there is a real blind spot if you push up a very > huge chunk of history in one pack, since gc counts packs, not objects). > > I agree that if gc is deciding to leave a big pack, it should probably > ditch the bitmaps for it. > >> I.e. we don't need to deal gracefully with some case where the bitmaps >> just cover some tiny part of the graph, we can just teach "gc" to either >> update them, or (if we're not currently writing them) unlink them. > > Unfortunately I don't think we can update them, because all of the > reachable objects need to be in the same pack. So any scheme that isn't > doing a periodic all-into-one repack probably shouldn't be using > bitmaps. I wonder if we need to tweak Eric's bitmaps-by-default logic to > better account for this. I mean either we'd update them via repacking, or have some heuristic to do away with them. >> That seems to me to be a good idea in general, not just with bitmaps but >> also the commit graph. If we're doing a GC and our current settings >> aren't such that we'd update those files, shouldn't we just unlink them? > > I don't think the commit graph would suffer from this. It's not tied to > a specific pack, so it can be freely updated on any gc. You still have > the problem when gc does not run. It's possible that auto-gc should have > separate thresholds for different activities (e.g., number of packs > should tell us when to repack objects, number of loose refs should tell > us when to pack refs, number of un-annotated commits should tell us when > to update the commit graph). The trouble is that some of those checks > are non-trivial. > > In the long run, I think the plan is for the commit graph to allow cheap > incremental updating, which may make it reasonable to just update it > preemptively after every fetch/push. I don't think it's a performance problem to have an old commit-graph lying around. But if you turn on the commit-graph, run gc a bunch, then turn it off in config we'll have it lying around forever, even if you do subsequent gc's. So I think we should delete such things on the general principle that the end-state of a gc's shouldn't be the accumulation of the values of past configuration options if we can help it. Maybe that screws over other users who did a "commit-graph write" without setting gc.writeCommitGraph, but I think the only sane thing to do is to make "gc" fully 'own' such things if its turned on at all. We don't worry e.g. that we can't repack some recent pack because a user might have just manually produced it with handcrafted love seconds earlier. Some of what you bring up is "let's do incremental GC". I agree, but think that can be considered separately from what GC should do *when* it runs, whether that's a "full" or "partial" run.
On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: > I don't think it's a performance problem to have an old commit-graph > lying around. But if you turn on the commit-graph, run gc a bunch, then > turn it off in config we'll have it lying around forever, even if you do > subsequent gc's. > > So I think we should delete such things on the general principle that > the end-state of a gc's shouldn't be the accumulation of the values of > past configuration options if we can help it. > > Maybe that screws over other users who did a "commit-graph write" > without setting gc.writeCommitGraph, but I think the only sane thing to > do is to make "gc" fully 'own' such things if its turned on at all. Note that there is 'core.commitGraph' as well; as long as it's enabled, no commit-graph files should be deleted.
On 5/24/2019 3:24 AM, Jeff King wrote: > On Thu, May 23, 2019 at 08:53:56AM -0400, Derrick Stolee wrote: > >>> I spent a while thinking and experimenting with this tonight. The result >>> is the patch below. Ævar, do you still have a copy of the repo that >>> misbehaved? I'd be curious to see how it fares. >> >> This patch caught my attention, and I think I understand some of the issues >> at hand. I'm not well-versed specifically in Git's bitmap implementation. >> The fundamental ideas are there, but my perspective is biased by my own >> independent bitmap implementation for Azure Repos. What worked there may not >> work at all here. > > Thanks for looking at this. There are a lot of short-comings in the > current bitmap implementation, so if there's a better way to do things, > I'm not opposed to moving to a new format. :) > >>> Finding the right trees to explore is a little tricky with bitmaps. In >>> a normal traversal, we consider the "edges" to be worth exploring. >>> I.e., the places where an UNINTERESTING commit is the parent of an >>> interesting one. >> >> This is the "commit frontier" which I bring up again below. > > Right. I actually had trouble coming up with a succinct way of > describing this, and stole the definition from your recent blog post. ;) > >>> But with bitmaps, we don't have that information in the same way, >>> because we're trying to avoid walking the commits in the first place. So >>> e.g., in a history like this: >>> >>> A--B--C >>> \ >>> D >>> >>> Let's imagine we're computing "C..D", and that D has a bitmap on disk >>> but C does not. >> >> (As I read your discussion below, I'm confused. For "C..D", C is a have >> and D is a want. We should explore all the haves _first_, then walk the >> wants, right?) > > I think I may have confused things by starting my description with a > hypothetical combined want/have walk. To take a step back: the problem > we were discussing is that we spend a lot of time reading trees to fill > in the "have" bitmap, even though most of those objects are unlikely to > be in the "want" in the first place (only the frontier trees are really > useful). Thank you for resolving my confusion. [snip] > As I said earlier in the thread, I suspect our commit selection is not > great. It's mostly some heuristics we threw together in 2013, and I > don't think it was tested very thoroughly. So fixing that may be a > simpler approach. It's a hard problem! There are no _sure_ answers here, and what works in some cases will probably not work in other cases. > What I was wondering here was whether we could get an easy fix based on > the same frontier ideas that the non-bitmap walk uses. [snip] > But doing that commit walk to find the frontier negates part of the > purpose of using the bitmaps, which is avoiding even walking commits. > Going back to a variant of our example: > > A -- B -- C_1 -- .. -- C_1000 > \ > D_1 -- .. -- D_1000 > > If we have a bitmap at C_1000 and D_1000, we don't have to walk any > commits at all. But finding the frontier requires walking 2000 commits. In my opinion, walking commits is easy (easier with the commit-graph) and walking trees is hard. We probably _should_ do more commit walking if it helps avoid walking some trees. > Is there a way to find it with just bitmaps? You can certainly find the > intersection, but you don't have any idea which of the many shared > commits is the merge base. Of course in this example you don't actually > care about the frontier (you have the full answer immediately). But how > do you decide which way to optimize: try to avoid walking commits to > get a quick answer from bitmaps, or prefer to walk some commits to find > the frontier and avoid opening too many trees? With a new perspective on the problem, I think perhaps trying to solve this problem with bitmaps is incorrect. If you want to use bitmaps for C..D and you don't have any bitmaps in the range D..C, then maybe we should just use the old-fashioned method of walking trees? In your examples above, the new method would walk trees for the commits in {B, C_i, D_j} while the bitmap method would walk trees for the commits in {B, C_i, A_k}. I would expect the list of {A_k} commits to be the largest in most cases. The approach here would be to do the commit frontier walk, and check for commits with bitmaps along the way. If we don't find an UNINTERESTING bitmap, then use the non-bitmap way instead. Perhaps worth a shot. I'll bring up this code snippet again: /* * if we have a HAVES list, but none of those haves is contained * in the packfile that has a bitmap, we don't have anything to * optimize here */ if (haves && !in_bitmapped_pack(bitmap_git, haves)) goto cleanup; In addition to this, we can fill the "haves" set with the commits in D..C (with boundary) and then check if any of those commits have a precomputed bitmap. If not, goto cleanup. Thanks, -Stolee
On Fri, May 24 2019, SZEDER Gábor wrote: > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: >> I don't think it's a performance problem to have an old commit-graph >> lying around. But if you turn on the commit-graph, run gc a bunch, then >> turn it off in config we'll have it lying around forever, even if you do >> subsequent gc's. >> >> So I think we should delete such things on the general principle that >> the end-state of a gc's shouldn't be the accumulation of the values of >> past configuration options if we can help it. >> >> Maybe that screws over other users who did a "commit-graph write" >> without setting gc.writeCommitGraph, but I think the only sane thing to >> do is to make "gc" fully 'own' such things if its turned on at all. > > Note that there is 'core.commitGraph' as well; as long as it's > enabled, no commit-graph files should be deleted. Why? If we won't update it or write it if it's not there, why keep it around? It means the commit-graph code and anything else (like bitmaps) needs to deal with stale data for the common and default gc --auto case. You also can't have e.g. a global core.commitGraph=true config along with a per-repo gc.writeCommitGraph=true config do what you expect. Now just because you wanted to write it for some you'll end up keeping it around forever because you'd also want to optimistically always use it if it's there. Note that I'm talking about the *default* gc semantics, they don't have to cover all advanced use-cases, just be good enough for most, and it's also important that they're as simple as possible, and don't result in stuff like "my performance sucks because I turned this config option on once a year ago for 2 days".
On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Fri, May 24 2019, SZEDER Gábor wrote: > > > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> I don't think it's a performance problem to have an old commit-graph > >> lying around. But if you turn on the commit-graph, run gc a bunch, then > >> turn it off in config we'll have it lying around forever, even if you do > >> subsequent gc's. > >> > >> So I think we should delete such things on the general principle that > >> the end-state of a gc's shouldn't be the accumulation of the values of > >> past configuration options if we can help it. > >> > >> Maybe that screws over other users who did a "commit-graph write" > >> without setting gc.writeCommitGraph, but I think the only sane thing to > >> do is to make "gc" fully 'own' such things if its turned on at all. > > > > Note that there is 'core.commitGraph' as well; as long as it's > > enabled, no commit-graph files should be deleted. > > Why? If we won't update it or write it if it's not there, why keep it > around? To read it, if 'core.commitGraph' says that is should be read. > It means the commit-graph code and anything else (like bitmaps) needs to > deal with stale data for the common and default gc --auto case. > > You also can't have e.g. a global core.commitGraph=true config along > with a per-repo gc.writeCommitGraph=true config do what you expect. > > Now just because you wanted to write it for some you'll end up keeping > it around forever because you'd also want to optimistically always use > it if it's there. This is exactly what I expect it to do. > Note that I'm talking about the *default* gc semantics, they don't have > to cover all advanced use-cases, just be good enough for most, and it's > also important that they're as simple as possible, and don't result in > stuff like "my performance sucks because I turned this config option on > once a year ago for 2 days".
On Fri, May 24 2019, SZEDER Gábor wrote: > On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Fri, May 24 2019, SZEDER Gábor wrote: >> >> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> I don't think it's a performance problem to have an old commit-graph >> >> lying around. But if you turn on the commit-graph, run gc a bunch, then >> >> turn it off in config we'll have it lying around forever, even if you do >> >> subsequent gc's. >> >> >> >> So I think we should delete such things on the general principle that >> >> the end-state of a gc's shouldn't be the accumulation of the values of >> >> past configuration options if we can help it. >> >> >> >> Maybe that screws over other users who did a "commit-graph write" >> >> without setting gc.writeCommitGraph, but I think the only sane thing to >> >> do is to make "gc" fully 'own' such things if its turned on at all. >> > >> > Note that there is 'core.commitGraph' as well; as long as it's >> > enabled, no commit-graph files should be deleted. >> >> Why? If we won't update it or write it if it's not there, why keep it >> around? > > To read it, if 'core.commitGraph' says that is should be read. > >> It means the commit-graph code and anything else (like bitmaps) needs to >> deal with stale data for the common and default gc --auto case. >> >> You also can't have e.g. a global core.commitGraph=true config along >> with a per-repo gc.writeCommitGraph=true config do what you expect. >> >> Now just because you wanted to write it for some you'll end up keeping >> it around forever because you'd also want to optimistically always use >> it if it's there. > > This is exactly what I expect it to do. Do you also expect base packs with an associated bitmap to have an implicit *.keep flag under gc with pack.writeBitmaps=false and pack.useBitmaps=true? >> Note that I'm talking about the *default* gc semantics, they don't have >> to cover all advanced use-cases, just be good enough for most, and it's >> also important that they're as simple as possible, and don't result in >> stuff like "my performance sucks because I turned this config option on >> once a year ago for 2 days".
On Fri, May 24, 2019 at 01:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote: > > On Fri, May 24 2019, SZEDER Gábor wrote: > > > On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > >> On Fri, May 24 2019, SZEDER Gábor wrote: > >> > >> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: > >> >> I don't think it's a performance problem to have an old commit-graph > >> >> lying around. But if you turn on the commit-graph, run gc a bunch, then > >> >> turn it off in config we'll have it lying around forever, even if you do > >> >> subsequent gc's. > >> >> > >> >> So I think we should delete such things on the general principle that > >> >> the end-state of a gc's shouldn't be the accumulation of the values of > >> >> past configuration options if we can help it. > >> >> > >> >> Maybe that screws over other users who did a "commit-graph write" > >> >> without setting gc.writeCommitGraph, but I think the only sane thing to > >> >> do is to make "gc" fully 'own' such things if its turned on at all. > >> > > >> > Note that there is 'core.commitGraph' as well; as long as it's > >> > enabled, no commit-graph files should be deleted. > >> > >> Why? If we won't update it or write it if it's not there, why keep it > >> around? > > > > To read it, if 'core.commitGraph' says that is should be read. > > > >> It means the commit-graph code and anything else (like bitmaps) needs to > >> deal with stale data for the common and default gc --auto case. > >> > >> You also can't have e.g. a global core.commitGraph=true config along > >> with a per-repo gc.writeCommitGraph=true config do what you expect. > >> > >> Now just because you wanted to write it for some you'll end up keeping > >> it around forever because you'd also want to optimistically always use > >> it if it's there. > > > > This is exactly what I expect it to do. > > Do you also expect base packs with an associated bitmap to have an > implicit *.keep flag under gc with pack.writeBitmaps=false and > pack.useBitmaps=true? I don't understand what an "implicit *.keep flag" is. However, since a reachability bitmap is always associated with a pack, but the commit-graph is not, I don't think this is a valid comparison. > >> Note that I'm talking about the *default* gc semantics, they don't have > >> to cover all advanced use-cases, just be good enough for most, and it's > >> also important that they're as simple as possible, and don't result in > >> stuff like "my performance sucks because I turned this config option on > >> once a year ago for 2 days".
On Fri, May 24 2019, SZEDER Gábor wrote: > On Fri, May 24, 2019 at 01:58:03PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> On Fri, May 24 2019, SZEDER Gábor wrote: >> >> > On Fri, May 24, 2019 at 01:17:06PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> >> On Fri, May 24 2019, SZEDER Gábor wrote: >> >> >> >> > On Fri, May 24, 2019 at 11:01:39AM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> >> I don't think it's a performance problem to have an old commit-graph >> >> >> lying around. But if you turn on the commit-graph, run gc a bunch, then >> >> >> turn it off in config we'll have it lying around forever, even if you do >> >> >> subsequent gc's. >> >> >> >> >> >> So I think we should delete such things on the general principle that >> >> >> the end-state of a gc's shouldn't be the accumulation of the values of >> >> >> past configuration options if we can help it. >> >> >> >> >> >> Maybe that screws over other users who did a "commit-graph write" >> >> >> without setting gc.writeCommitGraph, but I think the only sane thing to >> >> >> do is to make "gc" fully 'own' such things if its turned on at all. >> >> > >> >> > Note that there is 'core.commitGraph' as well; as long as it's >> >> > enabled, no commit-graph files should be deleted. >> >> >> >> Why? If we won't update it or write it if it's not there, why keep it >> >> around? >> > >> > To read it, if 'core.commitGraph' says that is should be read. >> > >> >> It means the commit-graph code and anything else (like bitmaps) needs to >> >> deal with stale data for the common and default gc --auto case. >> >> >> >> You also can't have e.g. a global core.commitGraph=true config along >> >> with a per-repo gc.writeCommitGraph=true config do what you expect. >> >> >> >> Now just because you wanted to write it for some you'll end up keeping >> >> it around forever because you'd also want to optimistically always use >> >> it if it's there. >> > >> > This is exactly what I expect it to do. >> >> Do you also expect base packs with an associated bitmap to have an >> implicit *.keep flag under gc with pack.writeBitmaps=false and >> pack.useBitmaps=true? > > I don't understand what an "implicit *.keep flag" is[...] A .keep means we keep the pack, and e.g. gc.bigPackThreshold is effectively an implicit *.keep flag on a pack matching some critera, which in this case caused this issue of a stale *.bitmap file (since the pack wasn't touched, neither was the bitmap). > [...]However, since a reachability bitmap is always associated with a > pack, but the commit-graph is not, I don't think this is a valid > comparison. I don't either, I'm just trying to understand where you're coming from, and I still don't. That bitmaps are associated with specific packs and the commit graph isn't is an internal implementation detail. Users who care about that distinction either don't use "git gc" or would be willing to tweak its settings. For the rest I think removing existing side-indexes is a good default for the practical reasons mentioned upthread. >> >> Note that I'm talking about the *default* gc semantics, they don't have >> >> to cover all advanced use-cases, just be good enough for most, and it's >> >> also important that they're as simple as possible, and don't result in >> >> stuff like "my performance sucks because I turned this config option on >> >> once a year ago for 2 days".
diff --git a/Documentation/config/repack.txt b/Documentation/config/repack.txt index a5c37813fd..9c413e177e 100644 --- a/Documentation/config/repack.txt +++ b/Documentation/config/repack.txt @@ -24,4 +24,4 @@ repack.writeBitmaps:: packs created for clones and fetches, at the cost of some disk space and extra time spent on the initial repack. This has no effect if multiple packfiles are created. - Defaults to false. + Defaults to true on bare repos, false otherwise. diff --git a/builtin/repack.c b/builtin/repack.c index 67f8978043..caca113927 100644 --- a/builtin/repack.c +++ b/builtin/repack.c @@ -14,7 +14,7 @@ static int delta_base_offset = 1; static int pack_kept_objects = -1; -static int write_bitmaps; +static int write_bitmaps = -1; static int use_delta_islands; static char *packdir, *packtmp; @@ -343,6 +343,9 @@ int cmd_repack(int argc, const char **argv, const char *prefix) (unpack_unreachable || (pack_everything & LOOSEN_UNREACHABLE))) die(_("--keep-unreachable and -A are incompatible")); + if (write_bitmaps < 0) + write_bitmaps = (pack_everything & ALL_INTO_ONE) && + is_bare_repository(); if (pack_kept_objects < 0) pack_kept_objects = write_bitmaps; diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh index 6162e2a8e6..86d05160a3 100755 --- a/t/t7700-repack.sh +++ b/t/t7700-repack.sh @@ -221,5 +221,22 @@ test_expect_success 'repack --keep-pack' ' ) ' -test_done +test_expect_success 'bitmaps are created by default in bare repos' ' + git clone --bare .git bare.git && + git -C bare.git repack -ad && + bitmap=$(ls bare.git/objects/pack/*.bitmap) && + test_path_is_file "$bitmap" +' + +test_expect_success 'incremental repack does not complain' ' + git -C bare.git repack -q 2>repack.err && + test_must_be_empty repack.err +' +test_expect_success 'bitmaps can be disabled on bare repos' ' + git -c repack.writeBitmaps=false -C bare.git repack -ad && + bitmap=$(ls bare.git/objects/pack/*.bitmap 2>/dev/null || :) && + test -z "$bitmap" +' + +test_done