diff mbox series

[v2] revision: allow missing promisor objects on CLI

Message ID 20191230234453.255082-1-jonathantanmy@google.com (mailing list archive)
State New, archived
Headers show
Series [v2] revision: allow missing promisor objects on CLI | expand

Commit Message

Jonathan Tan Dec. 30, 2019, 11:44 p.m. UTC
Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
objects", 2018-12-06) prevented some segmentation faults from occurring
by tightening handling of missing objects provided through the CLI: if
--ignore-missing is set, then it is OK (and the missing object ignored,
just like one would if encountered in traversal).

However, in the case that --ignore-missing is not set but
--exclude-promisor-objects is set, there is still no distinction between
the case wherein the missing object is a promisor object and the case
wherein it is not. This is unnecessarily restrictive, since if a missing
promisor object is encountered in traversal, it is ignored; likewise it
should be ignored if provided through the CLI. Therefore, distinguish
between these 2 cases. (As a bonus, the code is now simpler.)

(Note that this only affects handling of missing promisor objects.
Handling of non-missing promisor objects is already done by setting all
of them to UNINTERESTING in prepare_revision_walk().)

Additionally, clarify in get_reference() that error messages are already
being printed by the functions called (parse_object(),
repo_parse_commit(), and parse_commit_buffer() - invoked by the latter).

Signed-off-by: Jonathan Tan <jonathantanmy@google.com>
---
Changes from v1: Improved code comments and commit message

> This is the case where oid must be COMMIT from oid_object_info()'s
> point of view, but repo_parse_commit() finds it as a non-commit, and
> object becomes NULL.  This is quite different from the normal lazy
> clone case where exclude-promisor-objects etc. wants to cover, that
> the object whose name is oid is truly missing because it can be
> fetched later from elsewhere.  Instead, we have found that there is
> an inconsistency in the data we have about the object, iow, a
> possible corruption.

Thanks! I should have looked at the first half of get_reference() more
carefully.

If there is corruption in the form of hash mismatch, parse_object() will
print a message and then return NULL, leaving get_reference() to handle
it - and treat it as missing in this case. It seems reasonable to me to
handle the repo_parse_commit() failure in a similar way. I've added
comments to clarify that error messages are being printed.
---
 revision.c               | 23 ++++++++++++++++++++++-
 t/t0410-partial-clone.sh | 10 ++--------
 2 files changed, 24 insertions(+), 9 deletions(-)

Comments

Jonathan Nieder Dec. 31, 2019, 12:09 a.m. UTC | #1
Jonathan Tan wrote:

> Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline
> objects", 2018-12-06) prevented some segmentation faults from occurring
> by tightening handling of missing objects provided through the CLI: if
> --ignore-missing is set, then it is OK (and the missing object ignored,
> just like one would if encountered in traversal).
>
> However, in the case that --ignore-missing is not set but
> --exclude-promisor-objects is set, there is still no distinction between
> the case wherein the missing object is a promisor object and the case
> wherein it is not. This is unnecessarily restrictive, since if a missing
> promisor object is encountered in traversal, it is ignored; likewise it
> should be ignored if provided through the CLI. Therefore, distinguish
> between these 2 cases. (As a bonus, the code is now simpler.)

nit about tenses, not worth a reroll on its own: "As a bonus, this
makes the code simpler" (since commit messages describe the status quo
before the patch in the present tense).

[...]
> --- a/revision.c
> +++ b/revision.c
> @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
>  		if (!repo_parse_commit(revs->repo, c))
>  			object = (struct object *) c;
>  		else
> +			/*
> +			 * There is something wrong with the commit.
> +			 * repo_parse_commit() will have already printed an
> +			 * error message. For our purposes, treat as missing.
> +			 */
>  			object = NULL;
>  	} else {
> +		/*
> +		 * There is something wrong with the object. parse_object()

If we got here, we are in the !commit case, which is not inherently wrong,
right?  Is the intent something like the following?

	if (oid_object_info(...) == OBJ_COMMIT && !repo_parse_commit(...)) {
		... good ...
	} else if (parse_object(...)) {
		... good ...
	} else {
		/*
		 * An error occured while parsing the object.
		 * parse_commit or parse_object has already printed an
		 * error message.  For our purposes, it's safe to
		 * assume the object as missing.
		 */
		object = NULL;
	}

This might be easiest to understand as a separate mini-function.
Something like

 /*
  * Like parse_object, but optimized by reading commits from the
  * commit graph.
  *
  * If the repository has commit graphs, repo_parse_commit() avoids
  * reading the object buffer, so use it whenever possible.
  */
 static struct object *parse_object_probably_commit(
		struct repository *r, const struct object_id *oid)
 {
	struct commit *c;

	if (oid_object_info(r, oid, NULL) != OBJ_COMMIT)
		return parse_object(r, oid);

	c = lookup_commit(r, oid);
	if (repo_parse_commit(r, c))
		return NULL;
	return (struct object *) c;
 }

 static struct object *get_reference(struct rev_info *revs, ...)
 {
	struct object *object = parse_object_probably_commit(revs->repo, oid);
	if (!object)
		/*
		 * An error occured while parsing the object.
		 * parse_object_probably_commit has already printed an
		 * error message.  For our purposes, it's safe to
		 * assume the object as missing.
		 */
		;

By the way, why doesn't parse_object itself check the commit graph for
commit objects?

[...]
> @@ -1907,7 +1917,18 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
>  		verify_non_filename(revs->prefix, arg);
>  	object = get_reference(revs, arg, &oid, flags ^ local_flags);
>  	if (!object)
> -		return revs->ignore_missing ? 0 : -1;
> +		/*
> +		 * If this object is corrupt, get_reference() prints an error
> +		 * message and treats it as missing.

By "and treats it as missing" does this mean "and we should treat it
as missing"?  (Forgive my ignorance.)

Why do we treat corrupt objects as missing?  Is this for graceful
degredation when trying to recover data from a corrupt repository?

Thanks,
Jonathan
Jonathan Tan Jan. 2, 2020, 8:49 p.m. UTC | #2
> > However, in the case that --ignore-missing is not set but
> > --exclude-promisor-objects is set, there is still no distinction between
> > the case wherein the missing object is a promisor object and the case
> > wherein it is not. This is unnecessarily restrictive, since if a missing
> > promisor object is encountered in traversal, it is ignored; likewise it
> > should be ignored if provided through the CLI. Therefore, distinguish
> > between these 2 cases. (As a bonus, the code is now simpler.)
> 
> nit about tenses, not worth a reroll on its own: "As a bonus, this
> makes the code simpler" (since commit messages describe the status quo
> before the patch in the present tense).

OK.

> > --- a/revision.c
> > +++ b/revision.c
> > @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name,
> >  		if (!repo_parse_commit(revs->repo, c))
> >  			object = (struct object *) c;
> >  		else
> > +			/*
> > +			 * There is something wrong with the commit.
> > +			 * repo_parse_commit() will have already printed an
> > +			 * error message. For our purposes, treat as missing.
> > +			 */
> >  			object = NULL;
> >  	} else {
> > +		/*
> > +		 * There is something wrong with the object. parse_object()
> 
> If we got here, we are in the !commit case, which is not inherently wrong,
> right?

[snip]

Ah, good catch. It should be "If parse_object() returns NULL, ..."

> By the way, why doesn't parse_object itself check the commit graph for
> commit objects?

Because that's how I coded it in ec0c5798ee ("revision: use commit graph
in get_reference()", 2018-12-28). In the commit message, I said:

> Another way to accomplish this effect would be to modify parse_object()
> to use the commit graph if possible; however, I did not want to change
> parse_object()'s current behavior of always checking the object
> signature of the returned object.

But that doesn't mean that we cannot change it.

> By "and treats it as missing" does this mean "and we should treat it
> as missing"?  (Forgive my ignorance.)

I don't fully know if we should, hence my specific wording :-P but see
my answer to your next question.

> Why do we treat corrupt objects as missing?  Is this for graceful
> degredation when trying to recover data from a corrupt repository?

This (at least, treating wrong-hash objects the same as missing) has
been true since acdeec62cb ("Don't ever return corrupt objects from
"parse_object()"", 2007-03-20) (and that commit message has no
explanation). I think this is best - I consider corrupt objects to be
similar to missing objects with respect to repository corruption, so for
me it is logical to treat them the same way.
diff mbox series

Patch

diff --git a/revision.c b/revision.c
index 8136929e23..af1e31b4fc 100644
--- a/revision.c
+++ b/revision.c
@@ -370,8 +370,18 @@  static struct object *get_reference(struct rev_info *revs, const char *name,
 		if (!repo_parse_commit(revs->repo, c))
 			object = (struct object *) c;
 		else
+			/*
+			 * There is something wrong with the commit.
+			 * repo_parse_commit() will have already printed an
+			 * error message. For our purposes, treat as missing.
+			 */
 			object = NULL;
 	} else {
+		/*
+		 * There is something wrong with the object. parse_object()
+		 * will have already printed an error message. For our
+		 * purposes, treat as missing.
+		 */
 		object = parse_object(revs->repo, oid);
 	}
 
@@ -1907,7 +1917,18 @@  int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi
 		verify_non_filename(revs->prefix, arg);
 	object = get_reference(revs, arg, &oid, flags ^ local_flags);
 	if (!object)
-		return revs->ignore_missing ? 0 : -1;
+		/*
+		 * If this object is corrupt, get_reference() prints an error
+		 * message and treats it as missing.
+		 *
+		 * get_reference() returns NULL only if this object is missing
+		 * and ignore_missing is true, or this object is a (missing)
+		 * promisor object and exclude_promisor_objects is true. In
+		 * both these cases, we can safely ignore this object because
+		 * this object will not appear in output and cannot be used as
+		 * a source of UNINTERESTING ancestors (since it is missing).
+		 */
+		return 0;
 	add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags);
 	add_pending_object_with_path(revs, object, arg, oc.mode, oc.path);
 	free(oc.path);
diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh
index a3988bd4b8..fd28f5402a 100755
--- a/t/t0410-partial-clone.sh
+++ b/t/t0410-partial-clone.sh
@@ -416,15 +416,9 @@  test_expect_success 'rev-list dies for missing objects on cmd line' '
 	git -C repo config extensions.partialclone "arbitrary string" &&
 
 	for OBJ in "$COMMIT" "$TREE" "$BLOB"; do
-		test_must_fail git -C repo rev-list --objects \
+		git -C repo rev-list --objects \
 			--exclude-promisor-objects "$OBJ" &&
-		test_must_fail git -C repo rev-list --objects-edge-aggressive \
-			--exclude-promisor-objects "$OBJ" &&
-
-		# Do not die or crash when --ignore-missing is passed.
-		git -C repo rev-list --ignore-missing --objects \
-			--exclude-promisor-objects "$OBJ" &&
-		git -C repo rev-list --ignore-missing --objects-edge-aggressive \
+		git -C repo rev-list --objects-edge-aggressive \
 			--exclude-promisor-objects "$OBJ"
 	done
 '