Message ID | cover.1631730270.git.me@ttaylorr.com (mailing list archive) |
---|---|
Headers | show |
Series | repack: introduce `--write-midx` | expand |
Taylor Blau <me@ttaylorr.com> writes: > Nothing substantial has changed since last time. Mostly re-wrapping lines and > removing braces in macro'd for_each_xyz() loops.... > Otherwise, this series is unchagned. It depends on tb/multi-pack-bitmaps, which > has graduated to master. Range-diff below: Despite the explanation above ... > @@ midx.c: static void add_pack_to_midx(const char *full_path, size_t full_path_len, > - > - if (ends_with(file_name, ".idx")) { > display_progress(ctx->progress, ++ctx->pack_paths_checked); > -- if (ctx->m && midx_contains_pack(ctx->m, file_name)) > -- return; > -+ if (ctx->m) { > -+ if (midx_contains_pack(ctx->m, file_name)) > -+ return; > -+ } else if (ctx->to_include) { > -+ if (!string_list_has_string(ctx->to_include, file_name)) > -+ return; > -+ } > + if (ctx->m && midx_contains_pack(ctx->m, file_name)) > + return; > ++ else if (ctx->to_include && > ++ !string_list_has_string(ctx->to_include, file_name)) > ++ return; ... this part seems to change what the code does quite a bit. The original used to skip .to_include checks when .m is set but it did not pass midx_contains_pack(). Now .to_include check is done in a context with .m that does not pass midex_contains_pack() check. I suspect that this change since v1 is a bugfix. We make an early return based on midx_contains_pack() but make sure .m is not NULL, in order to be able to run the check. There is another early return condition that is orthogonal, and we do string_list check, but that is only doable if the .to_include is not NULL. The logic in v2 clearly expresses that, but v1 was simply buggy. But if it is the case, I'd step back a bit and further question if "else if" is a good construct to use here. We'd return if .m passes midx_contains_pack() check, and another check based on .to_include gives us an orthogonal chance to return early, so two "if" statement that are independent sitting next to each other may have avoided such a bug from the beginning, perhaps? if (ctx->m && midx_contains...) return; if (ctx->to_include && !string_list_has...) return; Of course you could if ((ctx->m && midx_contains...) || (ctx->to_include && !string_list_has...)) return; but that may not scale as well as two independent if statements. Everything else in the range-diff looked cosmetic as explained in the cover letter.
Junio C Hamano <gitster@pobox.com> writes: > But if it is the case, I'd step back a bit and further question if > "else if" is a good construct to use here. We'd return if .m passes > midx_contains_pack() check, and another check based on .to_include > gives us an orthogonal chance to return early, so two "if" statement > that are independent sitting next to each other may have avoided > such a bug from the beginning, perhaps? OK, I went back and checked your response to a review in an earlier round. If .m and .to_include cannot be turned on at the same time, then I think "else if" would express the intention more clearly. But if we go that route, the whole "if ... else if" may deserve a comment that explains why .m and .to_include are fundamentally and inherently mutually exclusive. In other words, is it possible if future enhancement may want to pass both .m and .to_include to allow the code path to check both conditions and return early? Thanks.
On Wed, Sep 15, 2021 at 12:29:20PM -0700, Junio C Hamano wrote: > Junio C Hamano <gitster@pobox.com> writes: > > > But if it is the case, I'd step back a bit and further question if > > "else if" is a good construct to use here. We'd return if .m passes > > midx_contains_pack() check, and another check based on .to_include > > gives us an orthogonal chance to return early, so two "if" statement > > that are independent sitting next to each other may have avoided > > such a bug from the beginning, perhaps? > > OK, I went back and checked your response to a review in an earlier > round. If .m and .to_include cannot be turned on at the same time, > then I think "else if" would express the intention more clearly. Yeah, exactly. I didn't think that this would be as interesting to reviewers as it ended up being, hence I didn't put much thought into it. > But if we go that route, the whole "if ... else if" may deserve a > comment that explains why .m and .to_include are fundamentally and > inherently mutually exclusive. In other words, is it possible if > future enhancement may want to pass both .m and .to_include to allow > the code path to check both conditions and return early? The gist is that they aren't inherently exclusive, but that using an existing MIDX (which is the result of having `.m` point at a MIDX struct) right now blindly reuses all of the existing packs in that MIDX, which is what to_include is trying to avoid. We could reuse an existing MIDX with to_include by filtering down its packs before carrying them forward, but don't currently. So that would cause this change to produce unexpected results if we ever changed that behavior. I think all of this is non-obvious enough that it warrants being written down in a comment. Here's a replacement for that first patch that should apply without conflict. But if you'd rather have a reroll or anything else, I'm happy to do that, too. --- >8 --- Subject: [PATCH] midx: expose `write_midx_file_only()` publicly Expose a variant of the write_midx_file() function which ignores packs that aren't included in an explicit "allow" list. This will be used in an upcoming patch to power a new `--stdin-packs` mode of `git multi-pack-index write` for callers that only want to include certain packs in a MIDX (and ignore any packs which may have happened to enter the repository independently, e.g., from pushes). Those patches will provide test coverage for this new function. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- midx.c | 59 +++++++++++++++++++++++++++++++++++++++++++++++++--------- midx.h | 5 +++++ 2 files changed, 55 insertions(+), 9 deletions(-) diff --git a/midx.c b/midx.c index 864034a6ad..61226eaa9d 100644 --- a/midx.c +++ b/midx.c @@ -475,6 +475,8 @@ struct write_midx_context { uint32_t num_large_offsets; int preferred_pack_idx; + + struct string_list *to_include; }; static void add_pack_to_midx(const char *full_path, size_t full_path_len, @@ -484,8 +486,26 @@ static void add_pack_to_midx(const char *full_path, size_t full_path_len, if (ends_with(file_name, ".idx")) { display_progress(ctx->progress, ++ctx->pack_paths_checked); + /* + * Note that at most one of ctx->m and ctx->to_include are set, + * so we are testing midx_contains_pack() and + * string_list_has_string() independently (guarded by the + * appropriate NULL checks). + * + * We could support passing to_include while reusing an existing + * MIDX, but don't currently since the reuse process drags + * forward all packs from an existing MIDX (without checking + * whether or not they appear in the to_include list). + * + * If we added support for that, these next two conditional + * should be performed independently (likely checking + * to_include before the existing MIDX). + */ if (ctx->m && midx_contains_pack(ctx->m, file_name)) return; + else if (ctx->to_include && + !string_list_has_string(ctx->to_include, file_name)) + return; ALLOC_GROW(ctx->info, ctx->nr + 1, ctx->alloc); @@ -1058,6 +1078,7 @@ static int write_midx_bitmap(char *midx_name, unsigned char *midx_hash, } static int write_midx_internal(const char *object_dir, + struct string_list *packs_to_include, struct string_list *packs_to_drop, const char *preferred_pack_name, unsigned flags) @@ -1082,10 +1103,17 @@ static int write_midx_internal(const char *object_dir, die_errno(_("unable to create leading directories of %s"), midx_name); - for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { - if (!strcmp(object_dir, cur->object_dir)) { - ctx.m = cur; - break; + if (!packs_to_include) { + /* + * Only reference an existing MIDX when not filtering which + * packs to include, since all packs and objects are copied + * blindly from an existing MIDX if one is present. + */ + for (cur = get_multi_pack_index(the_repository); cur; cur = cur->next) { + if (!strcmp(object_dir, cur->object_dir)) { + ctx.m = cur; + break; + } } } @@ -1136,10 +1164,13 @@ static int write_midx_internal(const char *object_dir, else ctx.progress = NULL; + ctx.to_include = packs_to_include; + for_each_file_in_pack_dir(object_dir, add_pack_to_midx, &ctx); stop_progress(&ctx.progress); - if (ctx.m && ctx.nr == ctx.m->num_packs && !packs_to_drop) { + if ((ctx.m && ctx.nr == ctx.m->num_packs) && + !(packs_to_include || packs_to_drop)) { struct bitmap_index *bitmap_git; int bitmap_exists; int want_bitmap = flags & MIDX_WRITE_BITMAP; @@ -1237,7 +1268,7 @@ static int write_midx_internal(const char *object_dir, QSORT(ctx.info, ctx.nr, pack_info_compare); - if (packs_to_drop && packs_to_drop->nr) { + if (ctx.m && packs_to_drop && packs_to_drop->nr) { int drop_index = 0; int missing_drops = 0; @@ -1380,7 +1411,17 @@ int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags) { - return write_midx_internal(object_dir, NULL, preferred_pack_name, flags); + return write_midx_internal(object_dir, NULL, NULL, preferred_pack_name, + flags); +} + +int write_midx_file_only(const char *object_dir, + struct string_list *packs_to_include, + const char *preferred_pack_name, + unsigned flags) +{ + return write_midx_internal(object_dir, packs_to_include, NULL, + preferred_pack_name, flags); } struct clear_midx_data { @@ -1660,7 +1701,7 @@ int expire_midx_packs(struct repository *r, const char *object_dir, unsigned fla free(count); if (packs_to_drop.nr) { - result = write_midx_internal(object_dir, &packs_to_drop, NULL, flags); + result = write_midx_internal(object_dir, NULL, &packs_to_drop, NULL, flags); m = NULL; } @@ -1851,7 +1892,7 @@ int midx_repack(struct repository *r, const char *object_dir, size_t batch_size, goto cleanup; } - result = write_midx_internal(object_dir, NULL, NULL, flags); + result = write_midx_internal(object_dir, NULL, NULL, NULL, flags); m = NULL; cleanup: diff --git a/midx.h b/midx.h index aa3da557bb..80f502d39b 100644 --- a/midx.h +++ b/midx.h @@ -2,6 +2,7 @@ #define MIDX_H #include "repository.h" +#include "string-list.h" struct object_id; struct pack_entry; @@ -62,6 +63,10 @@ int midx_contains_pack(struct multi_pack_index *m, const char *idx_or_pack_name) int prepare_multi_pack_index_one(struct repository *r, const char *object_dir, int local); int write_midx_file(const char *object_dir, const char *preferred_pack_name, unsigned flags); +int write_midx_file_only(const char *object_dir, + struct string_list *packs_to_include, + const char *preferred_pack_name, + unsigned flags); void clear_midx_file(struct repository *r); int verify_midx_file(struct repository *r, const char *object_dir, unsigned flags); int expire_midx_packs(struct repository *r, const char *object_dir, unsigned flags); -- 2.33.0.96.g73915697e6
Taylor Blau <me@ttaylorr.com> writes: > I think all of this is non-obvious enough that it warrants being written > down in a comment. Here's a replacement for that first patch that should > apply without conflict. But if you'd rather have a reroll or anything > else, I'm happy to do that, too. Thanks; will replace. The comment reads well.