Message ID | 4e883a4d1ccabb35cc6d122f23a475fca0d71ce1.1730122499.git.karthik.188@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | packfile: avoid using the 'the_repository' global variable | expand |
On Mon, Oct 28, 2024 at 02:43:43PM +0100, Karthik Nayak wrote: > diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c > index 0800714267..2b2816c243 100644 > --- a/builtin/pack-objects.c > +++ b/builtin/pack-objects.c > @@ -1529,7 +1529,7 @@ static int want_found_object(const struct object_id *oid, int exclude, > return 0; > if (ignore_packed_keep_in_core && p->pack_keep_in_core) > return 0; > - if (has_object_kept_pack(oid, flags)) > + if (has_object_kept_pack(the_repository, oid, flags)) Do we want to use p->repo here instead of the_repository? I think the answer is "yes" since in this function we are given a pack "p" and want to determine if the given object contained in "p" is useful to pack. If not, we want to search for it among other packs here, likely within the same repository. (Again, probably a moot point here since this is all going to be the_repository anyway, but just thinking aloud...). > } > > @@ -3627,7 +3627,7 @@ static void show_cruft_commit(struct commit *commit, void *data) > > static int cruft_include_check_obj(struct object *obj, void *data UNUSED) > { > - return !has_object_kept_pack(&obj->oid, IN_CORE_KEEP_PACKS); > + return !has_object_kept_pack(the_repository, &obj->oid, IN_CORE_KEEP_PACKS); Here we don't know what pack "obj" is contained in, which makes sense since this is a traversal callback, not something that is iterating over the contents of a particular pack or similar. So using the_repository is right here. Although... should we be using to_pack->repo here over the_repository (in builtin/pack-objects.c)? The rest of the code definitely does *not* do that, but I think probably should. > static int cruft_include_check(struct commit *commit, void *data) > diff --git a/diff.c b/diff.c > index dceac20d18..1d483bdf37 100644 > --- a/diff.c > +++ b/diff.c > @@ -4041,7 +4041,8 @@ static int reuse_worktree_file(struct index_state *istate, > * objects however would tend to be slower as they need > * to be individually opened and inflated. > */ > - if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) > + if (!FAST_WORKING_DIRECTORY && !want_file && > + has_object_pack(the_repository, oid)) > return 0; > > /* > diff --git a/list-objects.c b/list-objects.c > index 985d008799..31236a8dc9 100644 > --- a/list-objects.c > +++ b/list-objects.c > @@ -41,7 +41,8 @@ static void show_object(struct traversal_context *ctx, > { > if (!ctx->show_object) > return; > - if (ctx->revs->unpacked && has_object_pack(&object->oid)) > + if (ctx->revs->unpacked && has_object_pack(ctx->revs->repo, > + &object->oid)) > return; > > ctx->show_object(object, name, ctx->show_data); > diff --git a/pack-bitmap.c b/pack-bitmap.c > index 4fa9dfc771..d34ba9909a 100644 > --- a/pack-bitmap.c > +++ b/pack-bitmap.c > @@ -1889,7 +1889,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git, > bitmap_unset(result, i); > > for (i = 0; i < eindex->count; ++i) { > - if (has_object_pack(&eindex->objects[i]->oid)) > + if (has_object_pack(the_repository, &eindex->objects[i]->oid)) Interesting. I think the_repository in practice is fine here, but I might have expected something like bitmap_git->p->repo, or the equivalent for the MIDX case. So I was going to suggest something like: static struct repository *bitmap_repo(const struct bitmap_index *bitmap_git) { if (bitmap_is_midx(bitmap_git)) return bitmap_git->midx->repo; return bitmap_git->pack->repo; } and then rewriting this as: if (has_object_pack(bitmap_repo(bitmap_git), &eindex->objects[i]->oid)) , but we can't do that, because the MIDX structure does not know what repository it belongs to, only the object_dir it resides in! And I think that causes wrinkles earlier in your series that I didn't think of at the time when reviewing, because it seems odd in retrospect that, e.g. we have something like: load_multi_pack_index(the_repository->objects->odb->path, ...); where we pass in the object_dir path directly, but other functions like prepare_midx_pack() that take in a 'struct repository *'. I wonder if we should be initializing the MIDX with a repository pointer, so that it knows what repository it belongs to. I suspect that we will still have to pass in a separate string indicating the object_dir, likely because of the --object-dir quirk I mentioned earlier. But my main thought here is that we should be able to infer from a 'struct bitmap_index *' what repository it belongs to instead of using 'the_repository' here directly. The rest all looks quite reasonable to me. Thanks, Taylor
Taylor Blau <me@ttaylorr.com> writes: > On Mon, Oct 28, 2024 at 02:43:43PM +0100, Karthik Nayak wrote: >> diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c >> index 0800714267..2b2816c243 100644 >> --- a/builtin/pack-objects.c >> +++ b/builtin/pack-objects.c >> @@ -1529,7 +1529,7 @@ static int want_found_object(const struct object_id *oid, int exclude, >> return 0; >> if (ignore_packed_keep_in_core && p->pack_keep_in_core) >> return 0; >> - if (has_object_kept_pack(oid, flags)) >> + if (has_object_kept_pack(the_repository, oid, flags)) > > Do we want to use p->repo here instead of the_repository? I think the > answer is "yes" since in this function we are given a pack "p" and want > to determine if the given object contained in "p" is useful to pack. If > not, we want to search for it among other packs here, likely within the > same repository. > > (Again, probably a moot point here since this is all going to be > the_repository anyway, but just thinking aloud...). > I don't think it is a moot point at all. We do want to move up the layers and cleanup usage of global variables. Reducing the work required definitely gets us there faster. >> } >> >> @@ -3627,7 +3627,7 @@ static void show_cruft_commit(struct commit *commit, void *data) >> >> static int cruft_include_check_obj(struct object *obj, void *data UNUSED) >> { >> - return !has_object_kept_pack(&obj->oid, IN_CORE_KEEP_PACKS); >> + return !has_object_kept_pack(the_repository, &obj->oid, IN_CORE_KEEP_PACKS); > > Here we don't know what pack "obj" is contained in, which makes sense > since this is a traversal callback, not something that is iterating over > the contents of a particular pack or similar. So using the_repository is > right here. > > Although... should we be using to_pack->repo here over the_repository > (in builtin/pack-objects.c)? The rest of the code definitely does *not* > do that, but I think probably should. I think so too, I won't change existing code, but makes sense to do it in our patches. Will amend. > >> static int cruft_include_check(struct commit *commit, void *data) >> diff --git a/diff.c b/diff.c >> index dceac20d18..1d483bdf37 100644 >> --- a/diff.c >> +++ b/diff.c >> @@ -4041,7 +4041,8 @@ static int reuse_worktree_file(struct index_state *istate, >> * objects however would tend to be slower as they need >> * to be individually opened and inflated. >> */ >> - if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) >> + if (!FAST_WORKING_DIRECTORY && !want_file && >> + has_object_pack(the_repository, oid)) >> return 0; >> >> /* >> diff --git a/list-objects.c b/list-objects.c >> index 985d008799..31236a8dc9 100644 >> --- a/list-objects.c >> +++ b/list-objects.c >> @@ -41,7 +41,8 @@ static void show_object(struct traversal_context *ctx, >> { >> if (!ctx->show_object) >> return; >> - if (ctx->revs->unpacked && has_object_pack(&object->oid)) >> + if (ctx->revs->unpacked && has_object_pack(ctx->revs->repo, >> + &object->oid)) >> return; >> >> ctx->show_object(object, name, ctx->show_data); >> diff --git a/pack-bitmap.c b/pack-bitmap.c >> index 4fa9dfc771..d34ba9909a 100644 >> --- a/pack-bitmap.c >> +++ b/pack-bitmap.c >> @@ -1889,7 +1889,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git, >> bitmap_unset(result, i); >> >> for (i = 0; i < eindex->count; ++i) { >> - if (has_object_pack(&eindex->objects[i]->oid)) >> + if (has_object_pack(the_repository, &eindex->objects[i]->oid)) > > Interesting. I think the_repository in practice is fine here, but I > might have expected something like bitmap_git->p->repo, or the > equivalent for the MIDX case. > > So I was going to suggest something like: > > static struct repository *bitmap_repo(const struct bitmap_index *bitmap_git) > { > if (bitmap_is_midx(bitmap_git)) > return bitmap_git->midx->repo; > return bitmap_git->pack->repo; > } > > and then rewriting this as: > > if (has_object_pack(bitmap_repo(bitmap_git), &eindex->objects[i]->oid)) > > , but we can't do that, because the MIDX structure does not know what > repository it belongs to, only the object_dir it resides in! > Exactly, I agree it should be achieved from `struct bitmap_index`. Unfortunately we can't with the current state as you noted. > And I think that causes wrinkles earlier in your series that I didn't > think of at the time when reviewing, because it seems odd in retrospect > that, e.g. we have something like: > > load_multi_pack_index(the_repository->objects->odb->path, ...); > > where we pass in the object_dir path directly, but other functions like > prepare_midx_pack() that take in a 'struct repository *'. > > I wonder if we should be initializing the MIDX with a repository > pointer, so that it knows what repository it belongs to. I suspect that > we will still have to pass in a separate string indicating the > object_dir, likely because of the --object-dir quirk I mentioned > earlier. > > But my main thought here is that we should be able to infer from a > 'struct bitmap_index *' what repository it belongs to instead of using > 'the_repository' here directly. > I also think it makes sense to progress to the goal of removing global variables in a way where we primarily focus on a single file/subsystem at a time. And directionally between the bottom <> top levels. This patch series focuses on the `packfile.c` file, so we cleanup the file and remove associated usages of the global variable and try to also follow some form of cleanup as we go. But for other files, it is okay to still rely on the global variables. Slowly when the cleanup phase reaches those files, we can give our focus to those files. So here, it would be nice to have MIDX have a repository pointer too, but I think we'd be overshooting trying to refactor that in this series. So I'd leave it as is and focus on that when we get to cleaning up `pack-bitmap.c`. > The rest all looks quite reasonable to me. > > Thanks, > Taylor
diff --git a/builtin/count-objects.c b/builtin/count-objects.c index 04d80887e0..1e89148ed7 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -67,7 +67,7 @@ static int count_loose(const struct object_id *oid, const char *path, else { loose_size += on_disk_bytes(st); loose++; - if (verbose && has_object_pack(oid)) + if (verbose && has_object_pack(the_repository, oid)) packed_loose++; } return 0; diff --git a/builtin/fsck.c b/builtin/fsck.c index 7f4e2f0414..bb56eb98ac 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -272,7 +272,7 @@ static void check_reachable_object(struct object *obj) if (!(obj->flags & HAS_OBJ)) { if (is_promisor_object(&obj->oid)) return; - if (has_object_pack(&obj->oid)) + if (has_object_pack(the_repository, &obj->oid)) return; /* it is in pack - forget about it */ printf_ln(_("missing %s %s"), printable_type(&obj->oid, obj->type), diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 0800714267..2b2816c243 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1529,7 +1529,7 @@ static int want_found_object(const struct object_id *oid, int exclude, return 0; if (ignore_packed_keep_in_core && p->pack_keep_in_core) return 0; - if (has_object_kept_pack(oid, flags)) + if (has_object_kept_pack(the_repository, oid, flags)) return 0; } @@ -3627,7 +3627,7 @@ static void show_cruft_commit(struct commit *commit, void *data) static int cruft_include_check_obj(struct object *obj, void *data UNUSED) { - return !has_object_kept_pack(&obj->oid, IN_CORE_KEEP_PACKS); + return !has_object_kept_pack(the_repository, &obj->oid, IN_CORE_KEEP_PACKS); } static int cruft_include_check(struct commit *commit, void *data) diff --git a/diff.c b/diff.c index dceac20d18..1d483bdf37 100644 --- a/diff.c +++ b/diff.c @@ -4041,7 +4041,8 @@ static int reuse_worktree_file(struct index_state *istate, * objects however would tend to be slower as they need * to be individually opened and inflated. */ - if (!FAST_WORKING_DIRECTORY && !want_file && has_object_pack(oid)) + if (!FAST_WORKING_DIRECTORY && !want_file && + has_object_pack(the_repository, oid)) return 0; /* diff --git a/list-objects.c b/list-objects.c index 985d008799..31236a8dc9 100644 --- a/list-objects.c +++ b/list-objects.c @@ -41,7 +41,8 @@ static void show_object(struct traversal_context *ctx, { if (!ctx->show_object) return; - if (ctx->revs->unpacked && has_object_pack(&object->oid)) + if (ctx->revs->unpacked && has_object_pack(ctx->revs->repo, + &object->oid)) return; ctx->show_object(object, name, ctx->show_data); diff --git a/pack-bitmap.c b/pack-bitmap.c index 4fa9dfc771..d34ba9909a 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1889,7 +1889,7 @@ static void filter_packed_objects_from_bitmap(struct bitmap_index *bitmap_git, bitmap_unset(result, i); for (i = 0; i < eindex->count; ++i) { - if (has_object_pack(&eindex->objects[i]->oid)) + if (has_object_pack(the_repository, &eindex->objects[i]->oid)) bitmap_unset(result, objects_nr + i); } } diff --git a/packfile.c b/packfile.c index 096a0cd6ba..c6d7ed38f6 100644 --- a/packfile.c +++ b/packfile.c @@ -2143,16 +2143,17 @@ int find_kept_pack_entry(struct repository *r, return 0; } -int has_object_pack(const struct object_id *oid) +int has_object_pack(struct repository *repo, const struct object_id *oid) { struct pack_entry e; - return find_pack_entry(the_repository, oid, &e); + return find_pack_entry(repo, oid, &e); } -int has_object_kept_pack(const struct object_id *oid, unsigned flags) +int has_object_kept_pack(struct repository *repo, const struct object_id *oid, + unsigned flags) { struct pack_entry e; - return find_kept_pack_entry(the_repository, oid, flags, &e); + return find_kept_pack_entry(repo, oid, flags, &e); } int for_each_object_in_pack(struct packed_git *p, diff --git a/packfile.h b/packfile.h index 48d058699d..ac4f2210c5 100644 --- a/packfile.h +++ b/packfile.h @@ -193,8 +193,9 @@ const struct packed_git *has_packed_and_bad(struct repository *, const struct ob int find_pack_entry(struct repository *r, const struct object_id *oid, struct pack_entry *e); int find_kept_pack_entry(struct repository *r, const struct object_id *oid, unsigned flags, struct pack_entry *e); -int has_object_pack(const struct object_id *oid); -int has_object_kept_pack(const struct object_id *oid, unsigned flags); +int has_object_pack(struct repository *repo, const struct object_id *oid); +int has_object_kept_pack(struct repository *repo, const struct object_id *oid, + unsigned flags); /* * Return 1 if an object in a promisor packfile is or refers to the given diff --git a/prune-packed.c b/prune-packed.c index 2bb99c29df..d1c65ab10e 100644 --- a/prune-packed.c +++ b/prune-packed.c @@ -24,7 +24,7 @@ static int prune_object(const struct object_id *oid, const char *path, { int *opts = data; - if (!has_object_pack(oid)) + if (!has_object_pack(the_repository, oid)) return 0; if (*opts & PRUNE_PACKED_DRY_RUN) diff --git a/reachable.c b/reachable.c index 3e9b3dd0a4..09d2c50079 100644 --- a/reachable.c +++ b/reachable.c @@ -239,7 +239,7 @@ static int want_recent_object(struct recent_data *data, const struct object_id *oid) { if (data->ignore_in_core_kept_packs && - has_object_kept_pack(oid, IN_CORE_KEEP_PACKS)) + has_object_kept_pack(data->revs->repo, oid, IN_CORE_KEEP_PACKS)) return 0; return 1; } diff --git a/revision.c b/revision.c index f5f5b84f2b..d1d152a67b 100644 --- a/revision.c +++ b/revision.c @@ -4103,10 +4103,10 @@ enum commit_action get_commit_action(struct rev_info *revs, struct commit *commi { if (commit->object.flags & SHOWN) return commit_ignore; - if (revs->unpacked && has_object_pack(&commit->object.oid)) + if (revs->unpacked && has_object_pack(revs->repo, &commit->object.oid)) return commit_ignore; if (revs->no_kept_objects) { - if (has_object_kept_pack(&commit->object.oid, + if (has_object_kept_pack(revs->repo, &commit->object.oid, revs->keep_pack_cache_flags)) return commit_ignore; }
The functions `has_object[_kept]_pack` currently rely on the global variable `the_repository`. To eliminate global variable usage in `packfile.c`, we should progressively shift the dependency on the_repository to higher layers. Let's remove its usage from these functions and any related ones. Signed-off-by: Karthik Nayak <karthik.188@gmail.com> --- builtin/count-objects.c | 2 +- builtin/fsck.c | 2 +- builtin/pack-objects.c | 4 ++-- diff.c | 3 ++- list-objects.c | 3 ++- pack-bitmap.c | 2 +- packfile.c | 9 +++++---- packfile.h | 5 +++-- prune-packed.c | 2 +- reachable.c | 2 +- revision.c | 4 ++-- 11 files changed, 21 insertions(+), 17 deletions(-)