diff mbox series

[2/2] fetch-pack: warn if in commit graph but not obj db

Message ID 1027ff2cb7d9af5cc9ce6b653d28150457db8703.1730235646.git.jonathantanmy@google.com (mailing list archive)
State New
Headers show
Series When fetching, warn if in commit graph but not obj db | expand

Commit Message

Jonathan Tan Oct. 29, 2024, 9:11 p.m. UTC
When fetching, there is a step in which sought objects are first checked
against the local repository; only objects that are not in the local
repository are then fetched. This check first looks up the commit graph
file, and returns "present" if the object is in there.

However, the action of first looking up the commit graph file is not
done everywhere in Git, especially if the type of the object at the time
of lookup is not known. This means that in a repo corruption situation,
a user may encounter an "object missing" error, attempt to fetch it, and
still encounter the same error later when they reattempt their original
action, because the object is present in the commit graph file but not in
the object DB.

Therefore, detect when this occurs and print a warning. (Note that
we cannot proceed to include this object in the list of objects to
be fetched without changing at least the fetch negotiation code:
what would happen is that the client will send "want X" and "have X"
and when I tested at $DAYJOB with a work server that uses JGit, the
server reasonably returned an empty packfile. And changing the fetch
negotiation code to only use the object DB when deciding what to report
as "have" would be an unnecessary slowdown, I think.)

This was discovered when a lazy fetch of a missing commit completed with
nothing actually fetched, and the writing of the commit graph file after
every fetch then attempted to read said missing commit, triggering a
lazy fetch of said missing commit, resulting in an infinite loop with no
user-visible indication (until they check the list of processes running
on their computer). With this fix, at least a warning message will be
printed. Note that although the repo corruption we discovered was caused
by a bug in GC in a partial clone, the behavior that this patch teaches
Git to warn about applies to any repo with commit graph enabled and with
a missing commit, whether it is a partial clone or not.

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
 fetch-pack.c | 22 +++++++++++++++++++---
 object.h     |  2 +-
 2 files changed, 20 insertions(+), 4 deletions(-)

Comments

Josh Steadmon Oct. 30, 2024, 9:22 p.m. UTC | #1
On 2024.10.29 14:11, Jonathan Tan wrote:
> When fetching, there is a step in which sought objects are first checked
> against the local repository; only objects that are not in the local
> repository are then fetched. This check first looks up the commit graph
> file, and returns "present" if the object is in there.
> 
> However, the action of first looking up the commit graph file is not
> done everywhere in Git, especially if the type of the object at the time
> of lookup is not known. This means that in a repo corruption situation,
> a user may encounter an "object missing" error, attempt to fetch it, and
> still encounter the same error later when they reattempt their original
> action, because the object is present in the commit graph file but not in
> the object DB.
> 
> Therefore, detect when this occurs and print a warning. (Note that
> we cannot proceed to include this object in the list of objects to
> be fetched without changing at least the fetch negotiation code:
> what would happen is that the client will send "want X" and "have X"
> and when I tested at $DAYJOB with a work server that uses JGit, the
> server reasonably returned an empty packfile. And changing the fetch
> negotiation code to only use the object DB when deciding what to report
> as "have" would be an unnecessary slowdown, I think.)
> 
> This was discovered when a lazy fetch of a missing commit completed with
> nothing actually fetched, and the writing of the commit graph file after
> every fetch then attempted to read said missing commit, triggering a
> lazy fetch of said missing commit, resulting in an infinite loop with no
> user-visible indication (until they check the list of processes running
> on their computer). With this fix, at least a warning message will be
> printed. Note that although the repo corruption we discovered was caused
> by a bug in GC in a partial clone, the behavior that this patch teaches
> Git to warn about applies to any repo with commit graph enabled and with
> a missing commit, whether it is a partial clone or not.
> 
> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c | 22 +++++++++++++++++++---
>  object.h     |  2 +-
>  2 files changed, 20 insertions(+), 4 deletions(-)
> 
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 6728a0d2f5..5a0020366b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
>  #define ALTERNATE	(1U << 1)
>  #define COMMON		(1U << 6)
>  #define REACH_SCRATCH	(1U << 7)
> +#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)

We're defining a new flag, and we note it in object.h as well below, so
looks good so far.


>  /*
>   * After sending this many "have"s if we do not get any new ACK , we
> @@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
>  }
>  
>  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> -					       int mark_tags_complete)
> +					       int mark_additional_complete_information)

We're already marking some completion flags here, so we're just making
the parameter name more descriptive, OK.


>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
>  	struct commit *commit;
>  
>  	commit = lookup_commit_in_graph(the_repository, oid);
> -	if (commit)
> +	if (commit) {
> +		if (mark_additional_complete_information)
> +			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
>  		return commit;
> +	}

We already have a case where we're checking the commit graph, so we can
also mark the commit complete here... well, not the original "COMPLETE"
flag since we don't want to change behavior, but our new
COMPLETE_FROM_COMMIT_GRAPH flag. Sounds good.


>  
>  	while (1) {
>  		if (oid_object_info_extended(the_repository, oid, &info,
> @@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>  
>  			if (!tag->tagged)
>  				return NULL;
> -			if (mark_tags_complete)
> +			if (mark_additional_complete_information)
>  				tag->object.flags |= COMPLETE;
>  			oid = &tag->tagged->oid;
>  		} else {
> @@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	save_commit_buffer = old_save_commit_buffer;
>  }
>  
> +static void warn_in_commit_graph_only(const struct object_id *oid)
> +{
> +	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
> +		oid_to_hex(oid));
> +	warning(_("This is probably due to repo corruption."));
> +	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
> +}
> +

Here's the new warning. As mentioned in my reply to the cover letter, I
feel like it makes more sense to die(), but I don't feel too strongly
about it.


>  /*
>   * Returns 1 if every object pointed to by the given remote refs is available
>   * locally and reachable from a local ref, and 0 otherwise.
> @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
>  				      ref->name);
>  			continue;
>  		}
> +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> +			if (!has_object(the_repository, remote, 0))
> +				warn_in_commit_graph_only(remote);
> +		}

And now that we're checking what's local, we issue our warning if we
have an object missing from the DB but mentioned in the commit graph.
Seems fine, although I wonder if it makes more sense to fail earlier. It
looks like the only place we do the
`mark_additional_complete_information` checks is in `mark_complete()`,
so should we just check this condition there? No strong feelings either
way, just curious.


>  		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
>  			      ref->name);
>  	}
> diff --git a/object.h b/object.h
> index 17f32f1103..196e489253 100644
> --- a/object.h
> +++ b/object.h
> @@ -65,7 +65,7 @@ void object_array_init(struct object_array *array);
>  /*
>   * object flag allocation:
>   * revision.h:               0---------10         15               23------27
> - * fetch-pack.c:             01    67
> + * fetch-pack.c:             01    6-8
>   * negotiator/default.c:       2--5
>   * walker.c:                 0-2
>   * upload-pack.c:                4       11-----14  16-----19
> -- 
> 2.47.0.163.g1226f6d8fa-goog
> 
>
Taylor Blau Oct. 31, 2024, 8:59 p.m. UTC | #2
On Tue, Oct 29, 2024 at 02:11:05PM -0700, Jonathan Tan wrote:
> When fetching, there is a step in which sought objects are first checked
> against the local repository; only objects that are not in the local
> repository are then fetched. This check first looks up the commit graph
> file, and returns "present" if the object is in there.

OK.

> However, the action of first looking up the commit graph file is not
> done everywhere in Git, especially if the type of the object at the time
> of lookup is not known. This means that in a repo corruption situation,
> a user may encounter an "object missing" error, attempt to fetch it, and
> still encounter the same error later when they reattempt their original
> action, because the object is present in the commit graph file but not in
> the object DB.

I think the type of repository corruption here may be underspecified.

You say that we have some object, say X, whose type is not known. So we
don't load the commit-graph, realize that X is missing, and then try and
fetch it. In this scenario, is X actually in the commit-graph, but not
in the object database? Further, if X is in the commit-graph, I assume
we do not look it up there because we first try and find its type, which
fails, so we assume we don't have it (despite it appearing corruptly in
the commit-graph)?

I think that matches the behavior you're describing, but I want to make
sure that I'm not thinking of something else.

> This was discovered when a lazy fetch of a missing commit completed with
> nothing actually fetched, and the writing of the commit graph file after
> every fetch then attempted to read said missing commit, triggering a
> lazy fetch of said missing commit, resulting in an infinite loop with no
> user-visible indication (until they check the list of processes running
> on their computer).

Yuck :-).

> Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
> ---
>  fetch-pack.c | 22 +++++++++++++++++++---
>  object.h     |  2 +-
>  2 files changed, 20 insertions(+), 4 deletions(-)
>
> diff --git a/fetch-pack.c b/fetch-pack.c
> index 6728a0d2f5..5a0020366b 100644
> --- a/fetch-pack.c
> +++ b/fetch-pack.c
> @@ -57,6 +57,7 @@ static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
>  #define ALTERNATE	(1U << 1)
>  #define COMMON		(1U << 6)
>  #define REACH_SCRATCH	(1U << 7)
> +#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
>
>  /*
>   * After sending this many "have"s if we do not get any new ACK , we
> @@ -123,15 +124,18 @@ static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
>  }
>
>  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
> -					       int mark_tags_complete)
> +					       int mark_additional_complete_information)
>  {
>  	enum object_type type;
>  	struct object_info info = { .typep = &type };
>  	struct commit *commit;
>
>  	commit = lookup_commit_in_graph(the_repository, oid);
> -	if (commit)
> +	if (commit) {
> +		if (mark_additional_complete_information)
> +			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
>  		return commit;
> +	}
>
>  	while (1) {
>  		if (oid_object_info_extended(the_repository, oid, &info,
> @@ -143,7 +147,7 @@ static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
>
>  			if (!tag->tagged)
>  				return NULL;
> -			if (mark_tags_complete)
> +			if (mark_additional_complete_information)
>  				tag->object.flags |= COMPLETE;
>  			oid = &tag->tagged->oid;
>  		} else {
> @@ -809,6 +813,14 @@ static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
>  	save_commit_buffer = old_save_commit_buffer;
>  }
>
> +static void warn_in_commit_graph_only(const struct object_id *oid)
> +{
> +	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
> +		oid_to_hex(oid));
> +	warning(_("This is probably due to repo corruption."));
> +	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
> +}
> +
>  /*
>   * Returns 1 if every object pointed to by the given remote refs is available
>   * locally and reachable from a local ref, and 0 otherwise.
> @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
>  				      ref->name);
>  			continue;
>  		}
> +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> +			if (!has_object(the_repository, remote, 0))
> +				warn_in_commit_graph_only(remote);

You discuss this a little bit in your commit message, but I wonder if we
should just die() here. I feel like we're trying to work around a
situation where the commit-graph is obviously broken because it refers
to commit objects that don't actually exist in the object store.

A few thoughts in this area:

  - What situation provokes this to be true? I could imagine there is
    some bug that we don't fully have a grasp of. But I wonder if it is
    even easier to provoke than that, say by pruning some objects out of
    the object store, then not rewriting the commit-graph, leaving some
    of the references dangling.

  - Does 'git fsck' catch this case within the commit-graph?

  - Are the other areas of the code that rely on the assumption that all
    entries in the commit-graph actually exist on disk? If so, are they
    similarly broken?

Another thought about this whole thing is that we essentially have a
code path that says: "I found this object from the commit-graph, but
don't know if I actually have it on disk, so mark it to be checked later
via has_object()".

I wonder if it would be more straightforward to replace the call to
lookup_commit_in_graph() with a direct call to has_object() in the
deref_without_lazy_fetch() function, which I think would both (a)
eliminate the need for a new flag bit to be allocated, and (b) prevent
looking up the object twice.

Thoughts?

Thanks,
Taylor
Jonathan Tan Oct. 31, 2024, 9:23 p.m. UTC | #3
Josh Steadmon <steadmon@google.com> writes:
> >  /*
> >   * Returns 1 if every object pointed to by the given remote refs is available
> >   * locally and reachable from a local ref, and 0 otherwise.
> > @@ -830,6 +842,10 @@ static int everything_local(struct fetch_pack_args *args,
> >  				      ref->name);
> >  			continue;
> >  		}
> > +		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
> > +			if (!has_object(the_repository, remote, 0))
> > +				warn_in_commit_graph_only(remote);
> > +		}
> 
> And now that we're checking what's local, we issue our warning if we
> have an object missing from the DB but mentioned in the commit graph.
> Seems fine, although I wonder if it makes more sense to fail earlier. It
> looks like the only place we do the
> `mark_additional_complete_information` checks is in `mark_complete()`,
> so should we just check this condition there? No strong feelings either
> way, just curious.

When we were merely warning, it was useful to mark everything then
check later, so that a warning message would be printed once per
object, instead of potentially multiple times. (In the infinite case
that we discovered at $DAYJOB, it doesn't really matter since the
message is going to be printed an infinite number of times anyway,
but in the "plain" case in which the user is missing a commit and does
not have automatic commit graph writing enabled, the fetch will indeed
terminate.)

But since we're making this a fatal error, yes, it makes sense to fail
earlier. I've made the change.
Jonathan Tan Oct. 31, 2024, 9:43 p.m. UTC | #4
Taylor Blau <me@ttaylorr.com> writes:
> > However, the action of first looking up the commit graph file is not
> > done everywhere in Git, especially if the type of the object at the time
> > of lookup is not known. This means that in a repo corruption situation,
> > a user may encounter an "object missing" error, attempt to fetch it, and
> > still encounter the same error later when they reattempt their original
> > action, because the object is present in the commit graph file but not in
> > the object DB.
> 
> I think the type of repository corruption here may be underspecified.

Hmm...if you have any specific points you'd like me to elaborate on (or
better yet, wording suggestions), please let me know.

> You say that we have some object, say X, whose type is not known. So we
> don't load the commit-graph, realize that X is missing, and then try and
> fetch it.

Yes.

> In this scenario, is X actually in the commit-graph, but not
> in the object database?

Yes.

> Further, if X is in the commit-graph, I assume
> we do not look it up there because we first try and find its type, which
> fails, so we assume we don't have it (despite it appearing corruptly in
> the commit-graph)?
> 
> I think that matches the behavior you're describing, but I want to make
> sure that I'm not thinking of something else.

Strictly speaking, we are not trying to find its type. We are trying
to find the object itself. (One could argue that if we find out that
an object is a commit, we can then ignore the packfile and go look up
the commit graph file. I'm not so sure this is a good idea, but this is
moot, I think - as far as I know, we currently don't do this.)

But yes, if the object is not in the object DB, we assume we don't have
it.

> You discuss this a little bit in your commit message, but I wonder if we
> should just die() here. I feel like we're trying to work around a
> situation where the commit-graph is obviously broken because it refers
> to commit objects that don't actually exist in the object store.

Yeah, that seems to be the consensus. I've switched it to a fatal error.

> A few thoughts in this area:
> 
>   - What situation provokes this to be true? I could imagine there is
>     some bug that we don't fully have a grasp of. But I wonder if it is
>     even easier to provoke than that, say by pruning some objects out of
>     the object store, then not rewriting the commit-graph, leaving some
>     of the references dangling.

The fetching of promisor objects that are descendants of non-promisor
objects. [1]

I think that the rewriting of the commit graph happens on every repack,
thus avoiding the situation you describe (unless there is a bug there).

[1] https://lore.kernel.org/git/20241001191811.1934900-1-calvinwan@google.com/

>   - Does 'git fsck' catch this case within the commit-graph?

Honestly, I haven't checked - I've been concentrating on fixing the
fetch part for now (and also the bug that caused the missing commits
[2]).

[2] https://lore.kernel.org/git/cover.1729792911.git.jonathantanmy@google.com/

>   - Are the other areas of the code that rely on the assumption that all
>     entries in the commit-graph actually exist on disk? If so, are they
>     similarly broken?

Yes, the fetch negotiation code. It is not "broken" in that it solely
uses repo_parse_commit() which always checks the commit graph, so as
long as the commit graph has everything we need, there will be no error.

There might be other systems that rely both on the commit graph and the
object DB, and thus have an inconsistent view (so, "similarly broken" as
you describe it) but at least in the partial clone case, the severity of
the issue is not as high as in "fetch", because these other systems can
lazily fetch the missing commit and then proceed.

> Another thought about this whole thing is that we essentially have a
> code path that says: "I found this object from the commit-graph, but
> don't know if I actually have it on disk, so mark it to be checked later
> via has_object()".
> 
> I wonder if it would be more straightforward to replace the call to
> lookup_commit_in_graph() with a direct call to has_object() in the
> deref_without_lazy_fetch() function, which I think would both (a)
> eliminate the need for a new flag bit to be allocated, and (b) prevent
> looking up the object twice.
> 
> Thoughts?
> 
> Thanks,
> Taylor

This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
loading of refs via commit graph, 2021-09-01), and also would not work
without changes to the fetch negotiation code - I tried to describe it
in the commit message, perhaps not very clearly, but the issue is that
even if we emit "want X", the fetch negotiation code would emit "have
X" (the X is the same in both), and at least for our JGit server at
$DAYJOB, the combination of "want X" and "have X" results in the server
sending an empty packfile (reasonable behavior, I think). (And I don't
think the changes to the fetch negotiation code are worth it.)
Taylor Blau Nov. 1, 2024, 2:33 p.m. UTC | #5
On Thu, Oct 31, 2024 at 02:43:19PM -0700, Jonathan Tan wrote:
> > Another thought about this whole thing is that we essentially have a
> > code path that says: "I found this object from the commit-graph, but
> > don't know if I actually have it on disk, so mark it to be checked later
> > via has_object()".
> >
> > I wonder if it would be more straightforward to replace the call to
> > lookup_commit_in_graph() with a direct call to has_object() in the
> > deref_without_lazy_fetch() function, which I think would both (a)
> > eliminate the need for a new flag bit to be allocated, and (b) prevent
> > looking up the object twice.
> >
> > Thoughts?
>
> This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
> loading of refs via commit graph, 2021-09-01), and also would not work
> without changes to the fetch negotiation code - I tried to describe it
> in the commit message, perhaps not very clearly, but the issue is that
> even if we emit "want X", the fetch negotiation code would emit "have
> X" (the X is the same in both), and at least for our JGit server at
> $DAYJOB, the combination of "want X" and "have X" results in the server
> sending an empty packfile (reasonable behavior, I think). (And I don't
> think the changes to the fetch negotiation code are worth it.)

Thanks for the clarifications above. What I was trying to poke at here
was... doesn't the change as presented undo that optimization, just in a
different way?

In 62b5a35a33 we taught deref_without_lazy_fetch() to lookup commits
through the commit-graph. But in this patch, we now call has_object()
on top of that existing check. Am I missing something obvious?

Thanks,
Taylor
Jonathan Tan Nov. 1, 2024, 5:33 p.m. UTC | #6
Taylor Blau <me@ttaylorr.com> writes:
> On Thu, Oct 31, 2024 at 02:43:19PM -0700, Jonathan Tan wrote:
> > > Another thought about this whole thing is that we essentially have a
> > > code path that says: "I found this object from the commit-graph, but
> > > don't know if I actually have it on disk, so mark it to be checked later
> > > via has_object()".
> > >
> > > I wonder if it would be more straightforward to replace the call to
> > > lookup_commit_in_graph() with a direct call to has_object() in the
> > > deref_without_lazy_fetch() function, which I think would both (a)
> > > eliminate the need for a new flag bit to be allocated, and (b) prevent
> > > looking up the object twice.
> > >
> > > Thoughts?
> >
> > This would undo the optimization in 62b5a35a33 (fetch-pack: optimize
> > loading of refs via commit graph, 2021-09-01), and also would not work
> > without changes to the fetch negotiation code - I tried to describe it
> > in the commit message, perhaps not very clearly, but the issue is that
> > even if we emit "want X", the fetch negotiation code would emit "have
> > X" (the X is the same in both), and at least for our JGit server at
> > $DAYJOB, the combination of "want X" and "have X" results in the server
> > sending an empty packfile (reasonable behavior, I think). (And I don't
> > think the changes to the fetch negotiation code are worth it.)
> 
> Thanks for the clarifications above. What I was trying to poke at here
> was... doesn't the change as presented undo that optimization, just in a
> different way?
> 
> In 62b5a35a33 we taught deref_without_lazy_fetch() to lookup commits
> through the commit-graph. But in this patch, we now call has_object()
> on top of that existing check. Am I missing something obvious?
> 
> Thanks,
> Taylor

deref_without_lazy_fetch() is used in these situations:
 (1) to mark things COMPLETE (the 2nd argument is set to 1)
 (2) all other situations (the 2nd argument is set to 0)

62b5a35a33 teaches deref_without_lazy_fetch() to use the commit-graph in
all situations.

The change I have presented in this patch set teaches
deref_without_lazy_fetch() to read both the commit-graph and the object
DB in (1) but not (2). So I'm undoing the optimization, but not for
all situations.

My understanding of your suggestion was to undo the optimization in
all situations.
diff mbox series

Patch

diff --git a/fetch-pack.c b/fetch-pack.c
index 6728a0d2f5..5a0020366b 100644
--- a/fetch-pack.c
+++ b/fetch-pack.c
@@ -57,6 +57,7 @@  static struct string_list uri_protocols = STRING_LIST_INIT_DUP;
 #define ALTERNATE	(1U << 1)
 #define COMMON		(1U << 6)
 #define REACH_SCRATCH	(1U << 7)
+#define COMPLETE_FROM_COMMIT_GRAPH	(1U << 8)
 
 /*
  * After sending this many "have"s if we do not get any new ACK , we
@@ -123,15 +124,18 @@  static void for_each_cached_alternate(struct fetch_negotiator *negotiator,
 }
 
 static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
-					       int mark_tags_complete)
+					       int mark_additional_complete_information)
 {
 	enum object_type type;
 	struct object_info info = { .typep = &type };
 	struct commit *commit;
 
 	commit = lookup_commit_in_graph(the_repository, oid);
-	if (commit)
+	if (commit) {
+		if (mark_additional_complete_information)
+			commit->object.flags |= COMPLETE_FROM_COMMIT_GRAPH;
 		return commit;
+	}
 
 	while (1) {
 		if (oid_object_info_extended(the_repository, oid, &info,
@@ -143,7 +147,7 @@  static struct commit *deref_without_lazy_fetch(const struct object_id *oid,
 
 			if (!tag->tagged)
 				return NULL;
-			if (mark_tags_complete)
+			if (mark_additional_complete_information)
 				tag->object.flags |= COMPLETE;
 			oid = &tag->tagged->oid;
 		} else {
@@ -809,6 +813,14 @@  static void mark_complete_and_common_ref(struct fetch_negotiator *negotiator,
 	save_commit_buffer = old_save_commit_buffer;
 }
 
+static void warn_in_commit_graph_only(const struct object_id *oid)
+{
+	warning(_("You are attempting to fetch %s, which is in the commit graph file but not in the object database."),
+		oid_to_hex(oid));
+	warning(_("This is probably due to repo corruption."));
+	warning(_("If you are attempting to repair this repo corruption by refetching the missing object, use 'git fetch --refetch' with the missing object."));
+}
+
 /*
  * Returns 1 if every object pointed to by the given remote refs is available
  * locally and reachable from a local ref, and 0 otherwise.
@@ -830,6 +842,10 @@  static int everything_local(struct fetch_pack_args *args,
 				      ref->name);
 			continue;
 		}
+		if (o->flags & COMPLETE_FROM_COMMIT_GRAPH) {
+			if (!has_object(the_repository, remote, 0))
+				warn_in_commit_graph_only(remote);
+		}
 		print_verbose(args, _("already have %s (%s)"), oid_to_hex(remote),
 			      ref->name);
 	}
diff --git a/object.h b/object.h
index 17f32f1103..196e489253 100644
--- a/object.h
+++ b/object.h
@@ -65,7 +65,7 @@  void object_array_init(struct object_array *array);
 /*
  * object flag allocation:
  * revision.h:               0---------10         15               23------27
- * fetch-pack.c:             01    67
+ * fetch-pack.c:             01    6-8
  * negotiator/default.c:       2--5
  * walker.c:                 0-2
  * upload-pack.c:                4       11-----14  16-----19