Message ID | cover.1632880469.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | repack: introduce `--write-midx` | expand |
Taylor Blau <me@ttaylorr.com> writes: > Here is another small reroll of my series to implement the final component of > multi-pack reachability bitamps, which is to be able to write them from `git > repack`. > > This version incorporates feedback from Jonathan Tan and others at Google who > graciously provided review. A range-diff is included below, but the major > changes are: > > - A comment to explain the relationship between 'ctx->m' and 'ctx->to_include' > in midx.c:add_pack_to_midx(). > > - A comment to explain the meaning of write_midx_file_only(). > > - Clean-up of stray hunks, a missing strbuf_release(). > > - Replacing the name `existing_packs` with `existing_nonkept_packs` to > emphasize the relationship between it and the similarly-named > `existing_kept_packs`. > > Instead of depending on tb/multi-pack-bitmaps, this series now depends on the > tip of master. ... as the topic already graduated to 'master' ;-) But applying them on the same base as the previous round and merging the result to 'master', and applying them on 'master', will give us identical trees. Will replace; thanks.
> Here is another small reroll of my series to implement the final component of > multi-pack reachability bitamps, which is to be able to write them from `git > repack`. I see that all my comments up to patch 6 (of v2) have been addressed. There is one outstanding one involving a potentially uninitialized variable [1] from patch 8 (of v2) that you (Taylor) might have missed since I sent it after I reviewed the first 6. Other than that, Reviewed-by: Jonathan Tan <jonathantanmy@google.com> (conditional on that being resolved) [1] https://lore.kernel.org/git/20210924182247.2922561-1-jonathantanmy@google.com/
On Fri, Oct 01, 2021 at 01:01:35PM -0700, Jonathan Tan wrote: > > Here is another small reroll of my series to implement the final component of > > multi-pack reachability bitamps, which is to be able to write them from `git > > repack`. > > I see that all my comments up to patch 6 (of v2) have been addressed. > There is one outstanding one involving a potentially uninitialized > variable [1] from patch 8 (of v2) that you (Taylor) might have missed > since I sent it after I reviewed the first 6. Thanks! I must have missed your review of v2 when I sent v3 (indeed, my reroll came 5 days after you sent [1], so I missed it). I sent a replacement patch here: https://lore.kernel.org/git/YVeN0mXqYvTHtNB+@nand.local/ which Junio can take instead of [v3 8/8]. But I'm happy to reroll the whole series, too, whatever is easier. > Reviewed-by: Jonathan Tan <jonathantanmy@google.com> (conditional on > that being resolved) Thanks again :-). > [1] https://lore.kernel.org/git/20210924182247.2922561-1-jonathantanmy@google.com/ Thanks, Taylor
Here is another small reroll of my series to implement the final component of multi-pack reachability bitamps, which is to be able to write them from `git repack`. This version incorporates feedback from Jonathan Tan and others at Google who graciously provided review. A range-diff is included below, but the major changes are: - A comment to explain the relationship between 'ctx->m' and 'ctx->to_include' in midx.c:add_pack_to_midx(). - A comment to explain the meaning of write_midx_file_only(). - Clean-up of stray hunks, a missing strbuf_release(). - Replacing the name `existing_packs` with `existing_nonkept_packs` to emphasize the relationship between it and the similarly-named `existing_kept_packs`. Instead of depending on tb/multi-pack-bitmaps, this series now depends on the tip of master. Taylor Blau (9): midx: expose `write_midx_file_only()` publicly builtin/multi-pack-index.c: support `--stdin-packs` mode midx: preliminary support for `--refs-snapshot` builtin/repack.c: keep track of existing packs unconditionally builtin/repack.c: rename variables that deal with non-kept packs builtin/repack.c: extract showing progress to a variable builtin/repack.c: support writing a MIDX while repacking builtin/repack.c: make largest pack preferred builtin/repack.c: pass `--refs-snapshot` when writing bitmaps Documentation/git-multi-pack-index.txt | 19 ++ Documentation/git-repack.txt | 18 +- builtin/multi-pack-index.c | 36 +++- builtin/repack.c | 279 ++++++++++++++++++++++--- midx.c | 110 ++++++++-- midx.h | 15 +- pack-bitmap.c | 2 +- pack-bitmap.h | 1 + t/helper/test-read-midx.c | 25 ++- t/lib-midx.sh | 8 + t/t5319-multi-pack-index.sh | 15 ++ t/t5326-multi-pack-bitmaps.sh | 82 ++++++++ t/t7700-repack.sh | 96 +++++++++ t/t7703-repack-geometric.sh | 24 ++- 14 files changed, 674 insertions(+), 56 deletions(-) create mode 100644 t/lib-midx.sh Range-diff against v2: 1: 03c1b2c4d3 ! 1: b7c72d0bf5 midx: expose `write_midx_file_only()` publicly @@ midx.c: struct write_midx_context { static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ midx.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len, + + if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); ++ /* ++ * Note that at most one of ctx->m and ctx->to_include are set, ++ * so we are testing midx_contains_pack() and ++ * string_list_has_string() independently (guarded by the ++ * appropriate NULL checks). ++ * ++ * We could support passing to_include while reusing an existing ++ * MIDX, but don't currently since the reuse process drags ++ * forward all packs from an existing MIDX (without checking ++ * whether or not they appear in the to_include list). ++ * ++ * If we added support for that, these next two conditional ++ * should be performed independently (likely checking ++ * to_include before the existing MIDX). ++ */ if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; + else if (ctx->to_include && @@ midx.c: static int write_midx_internal(const char *object_dir, struct bitmap_index *bitmap_git; int bitmap_exists; int want_bitmap = flags & MIDX_WRITE_BITMAP; -@@ midx.c: static int write_midx_internal(const char *object_dir, - - QSORT(ctx.info, ctx.nr, pack_info_compare); - -- if (packs_to_drop && packs_to_drop->nr) { -+ if (ctx.m && packs_to_drop && packs_to_drop->nr) { - int drop_index = 0; - int missing_drops = 0; - @@ midx.c: int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags) @@ midx.h: int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pa int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); ++/* ++ * Variant of write_midx_file which writes a MIDX containing only the packs ++ * specified in packs_to_include. ++ */ +int write_midx_file_only(const char *object_dir, + struct string_list *packs_to_include, + const char *preferred_pack_name, 2: 59556e5545 = 2: 986ef14f2a builtin/multi-pack-index.c: support `--stdin-packs` mode 3: 42f1ae9ede ! 3: 4e3769a4f3 midx: preliminary support for `--refs-snapshot` @@ Commit message commit in the MIDX reaches something that isn't. This can happen when a multi-pack index contains some pack which refers - to loose objects (which by definition aren't included in the multi-pack - index). + to loose objects (e.g., if a pack was pushed after starting the repack + but before generating the MIDX which depends on an object which is + stored as loose in the repository, and by definition isn't included in + the multi-pack index). By taking a snapshot of the references before we start repacking, we can close that race window. In the above scenario (where we have a packed @@ midx.c: static void bitmap_show_commit(struct commit *commit, void *_data) + } + + fclose(f); ++ strbuf_release(&buf); + return 0; +} + @@ midx.h: int fill_midx_entry(struct repository *r, const struct object_id *oid, s int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); -int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); + /* + * Variant of write_midx_file which writes a MIDX containing only the packs + * specified in packs_to_include. + */ +int write_midx_file(const char *object_dir, + const char *preferred_pack_name, + const char *refs_snapshot, 4: c0d045a9de ! 4: 1b3dd331ca builtin/repack.c: keep track of existing packs unconditionally @@ Commit message ## builtin/repack.c ## @@ builtin/repack.c: static void remove_pack_on_signal(int signo) - * have a corresponding .keep file. These packs are not to - * be kept if we are going to pack everything into one file. + } + + /* +- * Adds all packs hex strings to the fname list, which do not +- * have a corresponding .keep file. These packs are not to +- * be kept if we are going to pack everything into one file. ++ * Adds all packs hex strings to either fname_list or fname_kept_list ++ * based on whether each pack has a corresponding .keep file or not. ++ * Packs without a .keep file are not to be kept if we are going to ++ * pack everything into one file. */ -static void get_non_kept_pack_filenames(struct string_list *fname_list, - const struct string_list *extra_keep) -: ---------- > 5: 15831a201a builtin/repack.c: rename variables that deal with non-kept packs 5: 09de03de47 = 6: 1a40161baf builtin/repack.c: extract showing progress to a variable 6: 83dfdb8b12 ! 7: 6854f0751d builtin/repack.c: support writing a MIDX while repacking @@ Commit message check we can make consistently when (1) telling the MIDX which packs we want to exclude, and (2) actually unlinking the redundant packs. + There is also a tiny change to short-circuit the body of + write_midx_included_packs() when no packs remain in the case of an empty + repository. The MIDX code does not handle this, so avoid trying to + generate a MIDX covering zero packs in the first place. + Signed-off-by: Taylor Blau <me@ttaylorr.com> ## Documentation/git-repack.txt ## @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry } +static void midx_included_packs(struct string_list *include, -+ struct string_list *existing_packs, ++ struct string_list *existing_nonkept_packs, + struct string_list *existing_kept_packs, + struct string_list *names, + struct pack_geometry *geometry) @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry + string_list_insert(include, strbuf_detach(&buf, NULL)); + } + } else { -+ for_each_string_list_item(item, existing_packs) { ++ for_each_string_list_item(item, existing_nonkept_packs) { + if (item->util) + continue; + string_list_insert(include, xstrfmt("%s.idx", item->string)); @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry + FILE *in; + int ret; + ++ if (!include->nr) ++ return 0; ++ + cmd.in = -1; + cmd.git_cmd = 1; + @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix + if (delete_redundant && pack_everything & ALL_INTO_ONE) { + const int hexsz = the_hash_algo->hexsz; + string_list_sort(&names); -+ for_each_string_list_item(item, &existing_packs) { ++ for_each_string_list_item(item, &existing_nonkept_packs) { + char *sha1; + size_t len = strlen(item->string); + if (len < hexsz) + continue; + sha1 = item->string + len - hexsz; ++ /* ++ * Mark this pack for deletion, which ensures that this ++ * pack won't be included in a MIDX (if `--write-midx` ++ * was given) and that we will actually delete this pack ++ * (if `-d` was given). ++ */ + item->util = (void*)(intptr_t)!string_list_has_string(&names, sha1); + } + } + + if (write_midx) { + struct string_list include = STRING_LIST_INIT_NODUP; -+ midx_included_packs(&include, &existing_packs, ++ midx_included_packs(&include, &existing_nonkept_packs, + &existing_kept_packs, &names, geometry); + + ret = write_midx_included_packs(&include, @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix - if (pack_everything & ALL_INTO_ONE) { - const int hexsz = the_hash_algo->hexsz; - string_list_sort(&names); -- for_each_string_list_item(item, &existing_packs) { +- for_each_string_list_item(item, &existing_nonkept_packs) { - char *sha1; - size_t len = strlen(item->string); - if (len < hexsz) @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix - if (!string_list_has_string(&names, sha1)) - remove_redundant_pack(packdir, item->string); - } -+ for_each_string_list_item(item, &existing_packs) { ++ for_each_string_list_item(item, &existing_nonkept_packs) { + if (!item->util) + continue; + remove_redundant_pack(packdir, item->string); @@ t/t7700-repack.sh: test_expect_success 'auto-bitmaps do not complain if unavaila +' + test_done + + ## t/t7703-repack-geometric.sh ## +@@ t/t7703-repack-geometric.sh: test_expect_success '--geometric with no packs' ' + ( + cd geometric && + +- git repack --geometric 2 >out && ++ git repack --write-midx --geometric 2 >out && + test_i18ngrep "Nothing new to pack" out + ) + ' 7: 68bc49d8ae ! 8: 3596c76daf builtin/repack.c: make largest pack preferred @@ builtin/repack.c: static int write_midx_included_packs(struct string_list *inclu if (ret) return ret; @@ builtin/repack.c: int cmd_repack(int argc, const char **argv, const char *prefix) - midx_included_packs(&include, &existing_packs, + midx_included_packs(&include, &existing_nonkept_packs, &existing_kept_packs, &names, geometry); - ret = write_midx_included_packs(&include, 8: eb24b308ec ! 9: d99f075321 builtin/repack.c: pass `--refs-snapshot` when writing bitmaps @@ builtin/repack.c: static void clear_pack_geometry(struct pack_geometry *geometry +} + static void midx_included_packs(struct string_list *include, - struct string_list *existing_packs, + struct string_list *existing_nonkept_packs, struct string_list *existing_kept_packs, @@ builtin/repack.c: static void midx_included_packs(struct string_list *include,