Message ID | 631b9a86778f15b7086e5f17fe850ffa151dd341.1730409376.git.jonathantanmy@google.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | When fetching, die if in commit graph but not obj db | expand |
Jonathan Tan <jonathantanmy@google.com> writes: > static struct commit *deref_without_lazy_fetch(const struct object_id *oid, > - int mark_tags_complete) > + int mark_tags_complete_and_check_obj_db) > { > enum object_type type; > struct object_info info = { .typep = &type }; > struct commit *commit; > > commit = lookup_commit_in_graph(the_repository, oid); > - if (commit) > + if (commit) { > + if (mark_tags_complete_and_check_obj_db) { > + if (!has_object(the_repository, oid, 0)) > + die_in_commit_graph_only(oid); > + } > return commit; > + } Hmph, even when we are not doing the mark-tags-complete thing, wouldn't it be a fatal error if the commit graph claims a commit exists but we are missing it? It also makes me wonder if it would be sufficient to prevent us from saying "have X" if we just pretend as if lookup_commit_in_graph() returned NULL in this case. In any case, infinitely recursing to lazily fetch a single commit is definitely worth fixing. Thanks for digging to the bottom of the problem and fixing it.
Junio C Hamano <gitster@pobox.com> writes: >> commit = lookup_commit_in_graph(the_repository, oid); >> - if (commit) >> + if (commit) { >> + if (mark_tags_complete_and_check_obj_db) { >> + if (!has_object(the_repository, oid, 0)) >> + die_in_commit_graph_only(oid); >> + } >> return commit; >> + } > > Hmph, even when we are not doing the mark-tags-complete thing, > wouldn't it be a fatal error if the commit graph claims a commit > exists but we are missing it? > > It also makes me wonder if it would be sufficient to prevent us from > saying "have X" if we just pretend as if lookup_commit_in_graph() > returned NULL in this case. Again, sorry for the noise. I think the posted patch is better without either of these two, simply because the "commit graph lies" case is a repository corruption, and "git fsck" should catch such a corruption (and if not, we should make sure it does). The normal codepaths should assume a healthy working repository. As has_object() is not without cost, an extra check is warranted only because not checking will go into infinite recursion. If it does not make us fail in such an unpleasant way if we return such a commit when we are not doing the mark-tags-complete thing (but makes us fail in some other controlled way), not paying cost for an extra check is the right thing. Thanks.
On Fri, Nov 1, 2024 at 12:25 PM Junio C Hamano <gitster@pobox.com> wrote: > > Junio C Hamano <gitster@pobox.com> writes: > > >> commit = lookup_commit_in_graph(the_repository, oid); > >> - if (commit) > >> + if (commit) { > >> + if (mark_tags_complete_and_check_obj_db) { > >> + if (!has_object(the_repository, oid, 0)) > >> + die_in_commit_graph_only(oid); > >> + } > >> return commit; > >> + } > > > > Hmph, even when we are not doing the mark-tags-complete thing, > > wouldn't it be a fatal error if the commit graph claims a commit > > exists but we are missing it? > > > > It also makes me wonder if it would be sufficient to prevent us from > > saying "have X" if we just pretend as if lookup_commit_in_graph() > > returned NULL in this case. > > Again, sorry for the noise. > > I think the posted patch is better without either of these two, > simply because the "commit graph lies" case is a repository > corruption, and "git fsck" should catch such a corruption (and if > not, we should make sure it does). > > The normal codepaths should assume a healthy working repository. > > As has_object() is not without cost, an extra check is warranted > only because not checking will go into infinite recursion. If it > does not make us fail in such an unpleasant way if we return such a > commit when we are not doing the mark-tags-complete thing (but makes > us fail in some other controlled way), not paying cost for an extra > check is the right thing. > > Thanks. Although the scenario I faked in t/t5330-no-lazy-fetch-with-commit-graph.sh usually does not occur, if we are unfortunate enough to encounter this issue, I hope it can automatically fix the problem as much as possible without relying on me to take an extra action. Thanks.
On Thu, Oct 31, 2024 at 02:19:01PM -0700, Jonathan Tan wrote: > diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh > index 5eb28f0512..feccd58324 100755 > --- a/t/t5330-no-lazy-fetch-with-commit-graph.sh > +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh > @@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit > test_commit -C with-commit any-commit && > anycommit=$(git -C with-commit rev-parse HEAD) && > GIT_TRACE="$(pwd)/trace.txt" \ > - git -C with-commit-graph fetch origin $anycommit 2>err && > + test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err && It appears that this line breaks CI: https://github.com/ttaylorr/git/actions/runs/11631453312/job/32392591229 because you're using a one-shot environment variable assignment before calling a shell function. This should instead be: test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \ git -C with-commit-graph fetch origin $anycommit 2>err && Thanks, Taylor
Junio C Hamano <gitster@pobox.com> writes: > Jonathan Tan <jonathantanmy@google.com> writes: > > > static struct commit *deref_without_lazy_fetch(const struct object_id *oid, > > - int mark_tags_complete) > > + int mark_tags_complete_and_check_obj_db) > > { > > enum object_type type; > > struct object_info info = { .typep = &type }; > > struct commit *commit; > > > > commit = lookup_commit_in_graph(the_repository, oid); > > - if (commit) > > + if (commit) { > > + if (mark_tags_complete_and_check_obj_db) { > > + if (!has_object(the_repository, oid, 0)) > > + die_in_commit_graph_only(oid); > > + } > > return commit; > > + } > > Hmph, even when we are not doing the mark-tags-complete thing, > wouldn't it be a fatal error if the commit graph claims a commit > exists but we are missing it? If we can detect this cheaply, yes it would be ideal if every time we read a commit from the commit graph, we also check that the commit exists in the object DB. I don't think we can do it cheaply, though. > It also makes me wonder if it would be sufficient to prevent us from > saying "have X" if we just pretend as if lookup_commit_in_graph() > returned NULL in this case. "have X" is controlled by the packfile negotiation part, so we would have to change that. (This part controls whether we send "want X" for a specific OID or not.) > In any case, infinitely recursing to lazily fetch a single commit is > definitely worth fixing. Thanks for digging to the bottom of the > problem and fixing it. Thanks.
Junio C Hamano <gitster@pobox.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > > >> commit = lookup_commit_in_graph(the_repository, oid); > >> - if (commit) > >> + if (commit) { > >> + if (mark_tags_complete_and_check_obj_db) { > >> + if (!has_object(the_repository, oid, 0)) > >> + die_in_commit_graph_only(oid); > >> + } > >> return commit; > >> + } > > > > Hmph, even when we are not doing the mark-tags-complete thing, > > wouldn't it be a fatal error if the commit graph claims a commit > > exists but we are missing it? > > > > It also makes me wonder if it would be sufficient to prevent us from > > saying "have X" if we just pretend as if lookup_commit_in_graph() > > returned NULL in this case. > > Again, sorry for the noise. > > I think the posted patch is better without either of these two, > simply because the "commit graph lies" case is a repository > corruption, and "git fsck" should catch such a corruption (and if > not, we should make sure it does). > > The normal codepaths should assume a healthy working repository. > > As has_object() is not without cost, an extra check is warranted > only because not checking will go into infinite recursion. If it > does not make us fail in such an unpleasant way if we return such a > commit when we are not doing the mark-tags-complete thing (but makes > us fail in some other controlled way), not paying cost for an extra > check is the right thing. > > Thanks. Just checking...by "the posted patch is better without either of these two", do you mean that we should not use has_object() here? I included it here in as narrow a scope as possible (when "mark_tags_complete_and_check_obj_db" is true) precisely because not checking will go into infinite recursion, as you said. (And indeed I did not expand the scope because I agree that the normal codepaths should assume a healthy working repository.)
Han Xin <hanxin.hx@bytedance.com> writes: > Although the scenario I faked in t/t5330-no-lazy-fetch-with-commit-graph.sh > usually does not occur, if we are unfortunate enough to encounter this issue, > I hope it can automatically fix the problem as much as possible without > relying on me to take an extra action. > > Thanks. Note that unfortunately the user will still need to take an extra action. My goal at first was to try to teach Git to fetch the commit (in the commit graph file but not in the object DB) anyway, so that (as you said) the problem will fix itself, but that requires not only a change in the part of the code that emits "want", but also the part that emits "have" (the fetch negotiation code). At that point, I decided that it's better to stop early and warn the user (it was not a fatal error in the original version of the code, but I have changed it).
Taylor Blau <me@ttaylorr.com> writes: > On Thu, Oct 31, 2024 at 02:19:01PM -0700, Jonathan Tan wrote: > > diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh > > index 5eb28f0512..feccd58324 100755 > > --- a/t/t5330-no-lazy-fetch-with-commit-graph.sh > > +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh > > @@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit > > test_commit -C with-commit any-commit && > > anycommit=$(git -C with-commit rev-parse HEAD) && > > GIT_TRACE="$(pwd)/trace.txt" \ > > - git -C with-commit-graph fetch origin $anycommit 2>err && > > + test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err && > > It appears that this line breaks CI: > > https://github.com/ttaylorr/git/actions/runs/11631453312/job/32392591229 > > because you're using a one-shot environment variable assignment before > calling a shell function. > > This should instead be: > > test_must_fail env GIT_TRACE="$(pwd)/trace.txt" \ > git -C with-commit-graph fetch origin $anycommit 2>err && > > Thanks, > Taylor Ah, thanks. I've made the change locally. There are a few ongoing conversations about this patch set (thanks everyone for participating) so I'll wait a while before sending out the next set.
Jonathan Tan <jonathantanmy@google.com> writes: > Junio C Hamano <gitster@pobox.com> writes: >> Junio C Hamano <gitster@pobox.com> writes: >> >> >> commit = lookup_commit_in_graph(the_repository, oid); >> >> - if (commit) >> >> + if (commit) { >> >> + if (mark_tags_complete_and_check_obj_db) { >> >> + if (!has_object(the_repository, oid, 0)) >> >> + die_in_commit_graph_only(oid); >> >> + } >> >> return commit; >> >> + } >> > >> > Hmph, even when we are not doing the mark-tags-complete thing, >> > wouldn't it be a fatal error if the commit graph claims a commit >> > exists but we are missing it? >> > >> > It also makes me wonder if it would be sufficient to prevent us from >> > saying "have X" if we just pretend as if lookup_commit_in_graph() >> > returned NULL in this case. >> >> Again, sorry for the noise. >> >> I think the posted patch is better without either of these two, >> simply because the "commit graph lies" case is a repository >> corruption, and "git fsck" should catch such a corruption (and if >> not, we should make sure it does). >> >> The normal codepaths should assume a healthy working repository. >> >> As has_object() is not without cost, an extra check is warranted >> only because not checking will go into infinite recursion. If it >> does not make us fail in such an unpleasant way if we return such a >> commit when we are not doing the mark-tags-complete thing (but makes >> us fail in some other controlled way), not paying cost for an extra >> check is the right thing. >> >> Thanks. > > Just checking...by "the posted patch is better without either > of these two", do you mean that we should not use has_object() > here? No, "these two" refers to two changes I hinted at in my message, i.e. (1) regardless of mark_tags_complete_and_check_obj_db shouldn't we check with has_object() and die? and (2) if we commit=NULL and keep going, would it be sufficient to fix it?
diff --git a/fetch-pack.c b/fetch-pack.c index 6728a0d2f5..fe1fb3c1b7 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -122,16 +122,29 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator, cb(negotiator, cache.items[i]); } +static void die_in_commit_graph_only(const struct object_id *oid) +{ + die(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database.\n" + "This is probably due to repo corruption.\n" + "If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."), + oid_to_hex(oid)); +} + static struct commit *deref_without_lazy_fetch(const struct object_id *oid, - int mark_tags_complete) + int mark_tags_complete_and_check_obj_db) { enum object_type type; struct object_info info = { .typep = &type }; struct commit *commit; commit = lookup_commit_in_graph(the_repository, oid); - if (commit) + if (commit) { + if (mark_tags_complete_and_check_obj_db) { + if (!has_object(the_repository, oid, 0)) + die_in_commit_graph_only(oid); + } return commit; + } while (1) { if (oid_object_info_extended(the_repository, oid, &info, @@ -143,7 +156,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid, if (!tag->tagged) return NULL; - if (mark_tags_complete) + if (mark_tags_complete_and_check_obj_db) tag->object.flags |= COMPLETE; oid = &tag->tagged->oid; } else { diff --git a/t/t5330-no-lazy-fetch-with-commit-graph.sh b/t/t5330-no-lazy-fetch-with-commit-graph.sh index 5eb28f0512..feccd58324 100755 --- a/t/t5330-no-lazy-fetch-with-commit-graph.sh +++ b/t/t5330-no-lazy-fetch-with-commit-graph.sh @@ -39,7 +39,7 @@ test_expect_success 'fetch any commit from promisor with the usage of the commit test_commit -C with-commit any-commit && anycommit=$(git -C with-commit rev-parse HEAD) && GIT_TRACE="$(pwd)/trace.txt" \ - git -C with-commit-graph fetch origin $anycommit 2>err && + test_must_fail git -C with-commit-graph fetch origin $anycommit 2>err && ! grep "fatal: promisor-remote: unable to fork off fetch subprocess" err && grep "git fetch origin" trace.txt >actual && test_line_count = 1 actual
When fetching, there is a step in which sought objects are first checked against the local repository; only objects that are not in the local repository are then fetched. This check first looks up the commit graph file, and returns "present" if the object is in there. However, the action of first looking up the commit graph file is not done everywhere in Git, especially if the type of the object at the time of lookup is not known. This means that in a repo corruption situation, a user may encounter an "object missing" error, attempt to fetch it, and still encounter the same error later when they reattempt their original action, because the object is present in the commit graph file but not in the object DB. Therefore, make it a fatal error when this occurs. (Note that we cannot proceed to include this object in the list of objects to be fetched without changing at least the fetch negotiation code: what would happen is that the client will send "want X" and "have X" and when I tested at $DAYJOB with a work server that uses JGit, the server reasonably returned an empty packfile. And changing the fetch negotiation code to only use the object DB when deciding what to report as "have" would be an unnecessary slowdown, I think.) This was discovered when a lazy fetch of a missing commit completed with nothing actually fetched, and the writing of the commit graph file after every fetch then attempted to read said missing commit, triggering a lazy fetch of said missing commit, resulting in an infinite loop with no user-visible indication (until they check the list of processes running on their computer). With this fix, there is no infinite loop. Note that although the repo corruption we discovered was caused by a bug in GC in a partial clone, the behavior that this patch teaches Git to warn about applies to any repo with commit graph enabled and with a missing commit, whether it is a partial clone or not. t5330, introduced in 3a1ea94a49 (commit-graph.c: no lazy fetch in lookup_commit_in_graph(), 2022-07-01), tests that an interaction between fetch and the commit graph does not cause an infinite loop. This patch changes the exit code in that situation, so that test had to be changed. Signed-off-by: Jonathan Tan <jonathantanmy@google.com> --- fetch-pack.c | 19 ++++++++++++++++--- t/t5330-no-lazy-fetch-with-commit-graph.sh | 2 +- 2 files changed, 17 insertions(+), 4 deletions(-)