Message ID | 20191230234453.255082-1-jonathantanmy@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] revision: allow missing promisor objects on CLI | expand |
Jonathan Tan wrote: > Commit 4cf67869b2 ("list-objects.c: don't segfault for missing cmdline > objects", 2018-12-06) prevented some segmentation faults from occurring > by tightening handling of missing objects provided through the CLI: if > --ignore-missing is set, then it is OK (and the missing object ignored, > just like one would if encountered in traversal). > > However, in the case that --ignore-missing is not set but > --exclude-promisor-objects is set, there is still no distinction between > the case wherein the missing object is a promisor object and the case > wherein it is not. This is unnecessarily restrictive, since if a missing > promisor object is encountered in traversal, it is ignored; likewise it > should be ignored if provided through the CLI. Therefore, distinguish > between these 2 cases. (As a bonus, the code is now simpler.) nit about tenses, not worth a reroll on its own: "As a bonus, this makes the code simpler" (since commit messages describe the status quo before the patch in the present tense). [...] > --- a/revision.c > +++ b/revision.c > @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name, > if (!repo_parse_commit(revs->repo, c)) > object = (struct object *) c; > else > + /* > + * There is something wrong with the commit. > + * repo_parse_commit() will have already printed an > + * error message. For our purposes, treat as missing. > + */ > object = NULL; > } else { > + /* > + * There is something wrong with the object. parse_object() If we got here, we are in the !commit case, which is not inherently wrong, right? Is the intent something like the following? if (oid_object_info(...) == OBJ_COMMIT && !repo_parse_commit(...)) { ... good ... } else if (parse_object(...)) { ... good ... } else { /* * An error occured while parsing the object. * parse_commit or parse_object has already printed an * error message. For our purposes, it's safe to * assume the object as missing. */ object = NULL; } This might be easiest to understand as a separate mini-function. Something like /* * Like parse_object, but optimized by reading commits from the * commit graph. * * If the repository has commit graphs, repo_parse_commit() avoids * reading the object buffer, so use it whenever possible. */ static struct object *parse_object_probably_commit( struct repository *r, const struct object_id *oid) { struct commit *c; if (oid_object_info(r, oid, NULL) != OBJ_COMMIT) return parse_object(r, oid); c = lookup_commit(r, oid); if (repo_parse_commit(r, c)) return NULL; return (struct object *) c; } static struct object *get_reference(struct rev_info *revs, ...) { struct object *object = parse_object_probably_commit(revs->repo, oid); if (!object) /* * An error occured while parsing the object. * parse_object_probably_commit has already printed an * error message. For our purposes, it's safe to * assume the object as missing. */ ; By the way, why doesn't parse_object itself check the commit graph for commit objects? [...] > @@ -1907,7 +1917,18 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi > verify_non_filename(revs->prefix, arg); > object = get_reference(revs, arg, &oid, flags ^ local_flags); > if (!object) > - return revs->ignore_missing ? 0 : -1; > + /* > + * If this object is corrupt, get_reference() prints an error > + * message and treats it as missing. By "and treats it as missing" does this mean "and we should treat it as missing"? (Forgive my ignorance.) Why do we treat corrupt objects as missing? Is this for graceful degredation when trying to recover data from a corrupt repository? Thanks, Jonathan
> > However, in the case that --ignore-missing is not set but > > --exclude-promisor-objects is set, there is still no distinction between > > the case wherein the missing object is a promisor object and the case > > wherein it is not. This is unnecessarily restrictive, since if a missing > > promisor object is encountered in traversal, it is ignored; likewise it > > should be ignored if provided through the CLI. Therefore, distinguish > > between these 2 cases. (As a bonus, the code is now simpler.) > > nit about tenses, not worth a reroll on its own: "As a bonus, this > makes the code simpler" (since commit messages describe the status quo > before the patch in the present tense). OK. > > --- a/revision.c > > +++ b/revision.c > > @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name, > > if (!repo_parse_commit(revs->repo, c)) > > object = (struct object *) c; > > else > > + /* > > + * There is something wrong with the commit. > > + * repo_parse_commit() will have already printed an > > + * error message. For our purposes, treat as missing. > > + */ > > object = NULL; > > } else { > > + /* > > + * There is something wrong with the object. parse_object() > > If we got here, we are in the !commit case, which is not inherently wrong, > right? [snip] Ah, good catch. It should be "If parse_object() returns NULL, ..." > By the way, why doesn't parse_object itself check the commit graph for > commit objects? Because that's how I coded it in ec0c5798ee ("revision: use commit graph in get_reference()", 2018-12-28). In the commit message, I said: > Another way to accomplish this effect would be to modify parse_object() > to use the commit graph if possible; however, I did not want to change > parse_object()'s current behavior of always checking the object > signature of the returned object. But that doesn't mean that we cannot change it. > By "and treats it as missing" does this mean "and we should treat it > as missing"? (Forgive my ignorance.) I don't fully know if we should, hence my specific wording :-P but see my answer to your next question. > Why do we treat corrupt objects as missing? Is this for graceful > degredation when trying to recover data from a corrupt repository? This (at least, treating wrong-hash objects the same as missing) has been true since acdeec62cb ("Don't ever return corrupt objects from "parse_object()"", 2007-03-20) (and that commit message has no explanation). I think this is best - I consider corrupt objects to be similar to missing objects with respect to repository corruption, so for me it is logical to treat them the same way.
diff --git a/revision.c b/revision.c index 8136929e23..af1e31b4fc 100644 --- a/revision.c +++ b/revision.c @@ -370,8 +370,18 @@ static struct object *get_reference(struct rev_info *revs, const char *name, if (!repo_parse_commit(revs->repo, c)) object = (struct object *) c; else + /* + * There is something wrong with the commit. + * repo_parse_commit() will have already printed an + * error message. For our purposes, treat as missing. + */ object = NULL; } else { + /* + * There is something wrong with the object. parse_object() + * will have already printed an error message. For our + * purposes, treat as missing. + */ object = parse_object(revs->repo, oid); } @@ -1907,7 +1917,18 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi verify_non_filename(revs->prefix, arg); object = get_reference(revs, arg, &oid, flags ^ local_flags); if (!object) - return revs->ignore_missing ? 0 : -1; + /* + * If this object is corrupt, get_reference() prints an error + * message and treats it as missing. + * + * get_reference() returns NULL only if this object is missing + * and ignore_missing is true, or this object is a (missing) + * promisor object and exclude_promisor_objects is true. In + * both these cases, we can safely ignore this object because + * this object will not appear in output and cannot be used as + * a source of UNINTERESTING ancestors (since it is missing). + */ + return 0; add_rev_cmdline(revs, object, arg_, REV_CMD_REV, flags ^ local_flags); add_pending_object_with_path(revs, object, arg, oc.mode, oc.path); free(oc.path); diff --git a/t/t0410-partial-clone.sh b/t/t0410-partial-clone.sh index a3988bd4b8..fd28f5402a 100755 --- a/t/t0410-partial-clone.sh +++ b/t/t0410-partial-clone.sh @@ -416,15 +416,9 @@ test_expect_success 'rev-list dies for missing objects on cmd line' ' git -C repo config extensions.partialclone "arbitrary string" && for OBJ in "$COMMIT" "$TREE" "$BLOB"; do - test_must_fail git -C repo rev-list --objects \ + git -C repo rev-list --objects \ --exclude-promisor-objects "$OBJ" && - test_must_fail git -C repo rev-list --objects-edge-aggressive \ - --exclude-promisor-objects "$OBJ" && - - # Do not die or crash when --ignore-missing is passed. - git -C repo rev-list --ignore-missing --objects \ - --exclude-promisor-objects "$OBJ" && - git -C repo rev-list --ignore-missing --objects-edge-aggressive \ + git -C repo rev-list --objects-edge-aggressive \ --exclude-promisor-objects "$OBJ" done '