Message ID | 20200924060344.15541-1-chriscool@tuxfamily.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | [v2] bisect: don't use invalid oid as rev when starting | expand |
Hi Christian, On Thu, 24 Sep 2020, Christian Couder wrote: > In 06f5608c14 (bisect--helper: `bisect_start` shell function > partially in C, 2019-01-02), we changed the following shell > code: > > - rev=$(git rev-parse -q --verify "$arg^{commit}") || { > - test $has_double_dash -eq 1 && > - die "$(eval_gettext "'\$arg' does not appear to be a valid revision")" > - break > - } > - revs="$revs $rev" These are awfully long lines. The reason is that you kept the indentation of the diff. But that's actually not necessary, because we do not need to apply a diff here; This code snippet is intended purely for human consumption. What I suggested in my adaptation of your patch was to lose the diff markers and to decrease the insane amount of indentation to just one (and two) horizontal tabs. > into: > > + char *commit_id = xstrfmt("%s^{commit}", arg); > + if (get_oid(commit_id, &oid) && has_double_dash) > + die(_("'%s' does not appear to be a valid " > + "revision"), arg); > + > + string_list_append(&revs, oid_to_hex(&oid)); > + free(commit_id); > > In case of an invalid "arg" when "has_double_dash" is false, the old > code would "break" out of the argument loop. > > In the new C code though, `oid_to_hex(&oid)` is unconditonally > appended to "revs". This is wrong first because "oid" is junk as > `get_oid(commit_id, &oid)` failed and second because it doesn't break > out of the argument loop. > > Not breaking out of the argument loop means that "arg" is then not > treated as a path restriction. > > Signed-off-by: Christian Couder <chriscool@tuxfamily.org> > Signed-off-by: Johannes Schindelin <johannes.schindelin@gmx.de> > --- > This patch is made on top of e1cfff6765 (Sixteenth batch, 2020-09-22) > and incorporates Dscho's suggestions. > > Thanks to Junio and Dscho for reviewing the first version. > > builtin/bisect--helper.c | 14 ++++++-------- > t/t6030-bisect-porcelain.sh | 7 +++++++ > 2 files changed, 13 insertions(+), 8 deletions(-) > > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index 7dcc1b5188..f4762e1774 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -484,15 +484,13 @@ static int bisect_start(struct bisect_terms *terms, const char **argv, int argc) > terms->term_bad = xstrdup(arg); > } else if (starts_with(arg, "--")) { > return error(_("unrecognized option: '%s'"), arg); > - } else { > - char *commit_id = xstrfmt("%s^{commit}", arg); > - if (get_oid(commit_id, &oid) && has_double_dash) > - die(_("'%s' does not appear to be a valid " > - "revision"), arg); > - > + } else if (!get_oid_committish(arg, &oid)) > string_list_append(&revs, oid_to_hex(&oid)); > - free(commit_id); > - } > + else if (has_double_dash) > + die(_("'%s' does not appear to be a valid " > + "revision"), arg); > + else > + break; Thank you! > } > pathspec_pos = i; > > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh > index b886529e59..70c39a9459 100755 > --- a/t/t6030-bisect-porcelain.sh > +++ b/t/t6030-bisect-porcelain.sh > @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' ' > git bisect bad $HASH4 > ' > > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' To avoid the overly long line (and also to re-use existing naming conventions), I replaced "path restrictions" by "pathspecs" here. What do you think? Ciao, Dscho > + git bisect reset && > + git bisect start foo bar && > + grep foo ".git/BISECT_NAMES" && > + grep bar ".git/BISECT_NAMES" > +' > + > test_expect_success 'bisect reset: back in the master branch' ' > git bisect reset && > echo "* master" > branch.expect && > -- > 2.28.0.585.ge1cfff6765 > >
Hi Dscho, On Thu, Sep 24, 2020 at 11:59 AM Johannes Schindelin <Johannes.Schindelin@gmx.de> wrote: > On Thu, 24 Sep 2020, Christian Couder wrote: > > - rev=$(git rev-parse -q --verify "$arg^{commit}") || { > > - test $has_double_dash -eq 1 && > > - die "$(eval_gettext "'\$arg' does not appear to be a valid revision")" > > - break > > - } > > - revs="$revs $rev" > > These are awfully long lines. The reason is that you kept the indentation > of the diff. But that's actually not necessary, because we do not need to > apply a diff here; This code snippet is intended purely for human > consumption. > > What I suggested in my adaptation of your patch was to lose the diff > markers and to decrease the insane amount of indentation to just one (and > two) horizontal tabs. Yeah, I didn't realize that. When I am sent some code or patch like this, I often hesitate between: - using it verbatim, which can create issues as it makes me more likely to overlook something in the case the sender didn't fully check everything - looking at the differences with the existing code/patch and applying them one by one, which has the risk of missing or forgetting a difference I guess the best would be to do both and then check the differences between the 2 results, but it feels like twice the amount of work for this step. > > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh > > index b886529e59..70c39a9459 100755 > > --- a/t/t6030-bisect-porcelain.sh > > +++ b/t/t6030-bisect-porcelain.sh > > @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' ' > > git bisect bad $HASH4 > > ' > > > > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' > > To avoid the overly long line (and also to re-use existing naming > conventions), I replaced "path restrictions" by "pathspecs" here. What do > you think? It's not a huge issue, but I tend to prefer using "restrictions" because the tests that check that these arguments are used properly are called "restricting bisection on one dir" and "restricting bisection on one dir and a file". So I feel that it keeps test names more coherent. Best, Christian.
Christian Couder <christian.couder@gmail.com> writes: >> > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' >> >> To avoid the overly long line (and also to re-use existing naming >> conventions), I replaced "path restrictions" by "pathspecs" here. What do >> you think? > > It's not a huge issue, but I tend to prefer using "restrictions" > because the tests that check that these arguments are used properly > are called "restricting bisection on one dir" and "restricting > bisection on one dir and a file". So I feel that it keeps test names > more coherent. Hmph, but in the context of a sentence "use an arg as X", we should try to pick a well-known word to describe various Git arguments for X, no? The one you are using, in order to filter the set of commits involved in the operation to those that touch specific set of paths, already has its own established name and the word is "pathspec". So...?
Christian Couder <christian.couder@gmail.com> writes: > - } else { > - char *commit_id = xstrfmt("%s^{commit}", arg); > - if (get_oid(commit_id, &oid) && has_double_dash) > - die(_("'%s' does not appear to be a valid " > - "revision"), arg); > - > + } else if (!get_oid_committish(arg, &oid)) This is wrong, I think. The "_committish" only applies to the fact that the disambiguation includes only the objects that are committishes, and the result are not peeled---you'll get an annotated tag back in oid, if you give it an annotated tag that points at a commit. This is not what ^{commit} does. It may happen to work if the downstream consumers peel objects (which are appended to the 'revs' here) down to commit when they are used, and if somebody verified that is indeed the case, it would be OK, but if this patch is presented as "unlike the previous broken one, this is the faithful conversion of the original in shell, so there is no need to verify anything outside of the patch context", that is a problem. We may need to audit recent additions of get_oid_committish() calls in our codebase. I suspect there may be other instances of the same mistake. Other than that, the code structure does look more straight-forward compared to the previous round. A fix on this version would involve peeling what is in oid down to commit, and you need to handle errors during that process, so it may not stay pretty with a fix, though. I dunno. Thanks. > string_list_append(&revs, oid_to_hex(&oid)); > - free(commit_id); > - } > + else if (has_double_dash) > + die(_("'%s' does not appear to be a valid " > + "revision"), arg); > + else > + break; > } > pathspec_pos = i; > > diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh > index b886529e59..70c39a9459 100755 > --- a/t/t6030-bisect-porcelain.sh > +++ b/t/t6030-bisect-porcelain.sh > @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' ' > git bisect bad $HASH4 > ' > > +test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' > + git bisect reset && > + git bisect start foo bar && > + grep foo ".git/BISECT_NAMES" && > + grep bar ".git/BISECT_NAMES" > +' > + > test_expect_success 'bisect reset: back in the master branch' ' > git bisect reset && > echo "* master" > branch.expect &&
Junio C Hamano <gitster@pobox.com> writes: > We may need to audit recent additions of get_oid_committish() calls > in our codebase. I suspect there may be other instances of the same > mistake. > > Other than that, the code structure does look more straight-forward > compared to the previous round. A fix on this version would involve > peeling what is in oid down to commit, and you need to handle errors > during that process, so it may not stay pretty with a fix, though. > I dunno. I'll queue it with this band-aid on top for now. Thanks. builtin/bisect--helper.c | 7 ++++--- t/t6030-bisect-porcelain.sh | 2 +- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index a1f97e3f6c..2fcc023a3b 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -474,13 +474,14 @@ static int bisect_start(struct bisect_terms *terms, int no_checkout, } else if (starts_with(arg, "--") && !one_of(arg, "--term-good", "--term-bad", NULL)) { return error(_("unrecognized option: '%s'"), arg); - } else if (!get_oid_committish(arg, &oid)) + } else if (!get_oidf(&oid, "%s^{commit}", arg)) { string_list_append(&revs, oid_to_hex(&oid)); - else if (has_double_dash) + } else if (has_double_dash) { die(_("'%s' does not appear to be a valid " "revision"), arg); - else + } else { break; + } } pathspec_pos = i; diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh index 94179b6acf..4dbfa63ca1 100755 --- a/t/t6030-bisect-porcelain.sh +++ b/t/t6030-bisect-porcelain.sh @@ -82,7 +82,7 @@ test_expect_success 'bisect fails if given any junk instead of revs' ' git bisect bad $HASH4 ' -test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' +test_expect_success 'bisect start without -- takes unknown arg as pathspec' ' git bisect reset && git bisect start foo bar && grep foo ".git/BISECT_NAMES" &&
Junio C Hamano <gitster@pobox.com> writes: > We may need to audit recent additions of get_oid_committish() calls > in our codebase. I suspect there may be other instances of the same > mistake. Interim progress report. I've only looked at files that use get_oid_treeish() but audited all uses of get_oid_*ish() in them. The results are as follows. * builtin/reset.c::parse_args() makes get_oid_committish() and get_oid_treeish() only to discard the object name, because it wants to ensure the args can be peeled down to such. OK. * builtin/reset.c::cmd_reset() applies get_oid_committish() and get_oid_treeish() to the result taken from the above, but then uses lookup_commit_reference() and parse_tree_indirect() to peel the result to the desired type. OK. * notes.c::init_notes() uses get_oid_treeish() to validate that the notes ref can be read as a tree, and then uses get_tree_entry() on it, which in turn uses read_object_with_reference() for tree_type so it tolerates a commit object. OK. I didn't audit the following hits of get_oid_committish(). There might be a similar mistake as you made in v2, or there may not be. I am undecided if I should just move on, marking them as left-over-bits ;-) builtin/blame.c: if (get_oid_committish(i->string, &oid)) builtin/checkout.c: repo_get_oid_committish(the_repository, branch->name, &branch->oid); builtin/rev-parse.c: if (!get_oid_committish(start, &start_oid) && !get_oid_committish(end, &end_oid)) { builtin/rev-parse.c: if (get_oid_committish(arg, &oid) || commit.c: if (get_oid_committish(name, &oid)) revision.c: if (get_oid_committish(arg, &oid)) sequencer.c: !get_oid_committish(buf.buf, &oid)) sha1-name.c: st = repo_get_oid_committish(r, sb.buf, &oid_tmp); sha1-name.c: if (repo_get_oid_committish(r, dots[3] ? (dots + 3) : "HEAD", &oid_tmp)) sha1-name.c:int repo_get_oid_committish(struct repository *r, t/helper/test-reach.c: if (get_oid_committish(buf.buf + 2, &oid))
Junio C Hamano <gitster@pobox.com> writes: > I didn't audit the following hits of get_oid_committish(). There > might be a similar mistake as you made in v2, or there may not be. > > I am undecided if I should just move on, marking them as > left-over-bits ;-) > > > > builtin/blame.c: if (get_oid_committish(i->string, &oid)) This one throws the object name of revs to be skipped to a list, and because revision traversal works on commit objects, if the user gives an annotated tag and expects the underlying commit is ignored, it may appear as a bug. But in the same function a list of revs to be ignored is read from file using oidset_parse_file() that in turn uses parse_oid_hex() without even validating if the named object exists, I would say it is OK---after all, if it hurts, the user can refrain from doing so ;-) But it would be nice to fix all issues around this caller. After collecting the object names to an oidset, somebody should go through the list, peel them down to commit and make sure they exist, or something like that. A possible #leftoverbits. > builtin/checkout.c: repo_get_oid_committish(the_repository, branch->name, &branch->oid); This one is probably OK as branch refs are supposed to point at commits and not annotated tags that point at commits. > builtin/rev-parse.c: if (!get_oid_committish(start, &start_oid) && !get_oid_committish(end, &end_oid)) { This one handles "rev-parse v1.0..v2.0" which gives "^v1.0 v2.0" but using the (unpeeled) object name. It is fine and should not be changed to auto-peel. > builtin/rev-parse.c: if (get_oid_committish(arg, &oid) || This is immediately followed by lookup_commit_reference() to peel as needed. OK. > commit.c: if (get_oid_committish(name, &oid)) This is part of lookup_commit_reference_by_name(), which peels and parses it down to an in-core commit object instance. OK. > revision.c: if (get_oid_committish(arg, &oid)) This is followed by a loop to peel it as needed. OK. > sequencer.c: !get_oid_committish(buf.buf, &oid)) This feeds the contents of rebase-merge/stopped-sha file. I presume that the contents of this file (which is not directly shown to the end users) is always a commit object name, so this is OK. Use of _committish() may probably be overkill for this internal bookkeeping file. If we stop make_patch() from shortening then probably we can change it to parse_oid_hex() to expect and read the full object name. > sha1-name.c: st = repo_get_oid_committish(r, sb.buf, &oid_tmp); > sha1-name.c: if (repo_get_oid_committish(r, dots[3] ? (dots + 3) : "HEAD", &oid_tmp)) Since I know those who wrote this old part of the codebase knew what they were doing, I do not have to comment, but these are fine. They are all peeled to commit as appropriate by calling lookup_commit_reference_gently() before feeding the result to get_merge_bases(). > sha1-name.c:int repo_get_oid_committish(struct repository *r, This is the implementation ;-) > t/helper/test-reach.c: if (get_oid_committish(buf.buf + 2, &oid)) This peels afterwards, so it is OK. The true reason I went through all the callers was to see if _all_ the callers want to either ignore the resulting object name (i.e. they want to make sure that the arg given can be peeled down to an appropriate type) or wants the object name to be peeled to the type. If that were the case (and from the above, it clearly isn't), we could change the semantics of get_oid_*ish() so that the resulting oid is already peeled down to the wanted type and that could simplify the current callers that are peeling the result themselves. But because some callers do not want to see the result peeled, we shouldn't touch what get_oid_*ish() functions do.
On Thu, Sep 24, 2020 at 8:55 PM Junio C Hamano <gitster@pobox.com> wrote: > > Christian Couder <christian.couder@gmail.com> writes: > > > - } else { > > - char *commit_id = xstrfmt("%s^{commit}", arg); > > - if (get_oid(commit_id, &oid) && has_double_dash) > > - die(_("'%s' does not appear to be a valid " > > - "revision"), arg); > > - > > + } else if (!get_oid_committish(arg, &oid)) > > This is wrong, I think. The "_committish" only applies to the fact > that the disambiguation includes only the objects that are > committishes, and the result are not peeled---you'll get an > annotated tag back in oid, if you give it an annotated tag that > points at a commit. > > This is not what ^{commit} does. Thanks for finding this. > It may happen to work if the downstream consumers peel objects > (which are appended to the 'revs' here) down to commit when they are > used, and if somebody verified that is indeed the case, it would be > OK, but if this patch is presented as "unlike the previous broken > one, this is the faithful conversion of the original in shell, so > there is no need to verify anything outside of the patch context", > that is a problem. I agree that it's better if there is no need to verify anything outside of the patch context. So I agree with your fix. I am also ok with using "pathspec" in the test description of the new test. Thanks, Christian.
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index 7dcc1b5188..f4762e1774 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -484,15 +484,13 @@ static int bisect_start(struct bisect_terms *terms, const char **argv, int argc) terms->term_bad = xstrdup(arg); } else if (starts_with(arg, "--")) { return error(_("unrecognized option: '%s'"), arg); - } else { - char *commit_id = xstrfmt("%s^{commit}", arg); - if (get_oid(commit_id, &oid) && has_double_dash) - die(_("'%s' does not appear to be a valid " - "revision"), arg); - + } else if (!get_oid_committish(arg, &oid)) string_list_append(&revs, oid_to_hex(&oid)); - free(commit_id); - } + else if (has_double_dash) + die(_("'%s' does not appear to be a valid " + "revision"), arg); + else + break; } pathspec_pos = i; diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh index b886529e59..70c39a9459 100755 --- a/t/t6030-bisect-porcelain.sh +++ b/t/t6030-bisect-porcelain.sh @@ -82,6 +82,13 @@ test_expect_success 'bisect fails if given any junk instead of revs' ' git bisect bad $HASH4 ' +test_expect_success 'bisect start without -- uses unknown arg as path restriction' ' + git bisect reset && + git bisect start foo bar && + grep foo ".git/BISECT_NAMES" && + grep bar ".git/BISECT_NAMES" +' + test_expect_success 'bisect reset: back in the master branch' ' git bisect reset && echo "* master" > branch.expect &&