Message ID | 20201221162743.96056-3-mirucam@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Finish converting git bisect to C part 3 | expand |
Miriam Rubio writes: > From: Pranit Bauva <pranit.bauva@gmail.com> > > Reimplement the `bisect_replay` shell function in C and also add > `--bisect-replay` subcommand to `git bisect--helper` to call it from > git-bisect.sh > > Using `--bisect-replay` subcommand is a temporary measure to port shell > function to C so as to use the existing test suite. > > Mentored-by: Lars Schneider <larsxschneider@gmail.com> > Mentored-by: Christian Couder <chriscool@tuxfamily.org> > Mentored-by: Johannes Schindelin <Johannes.Schindelin@gmx.de> > Signed-off-by: Pranit Bauva <pranit.bauva@gmail.com> > Signed-off-by: Tanushree Tumane <tanushreetumane@gmail.com> > Signed-off-by: Miriam Rubio <mirucam@gmail.com> > --- > builtin/bisect--helper.c | 127 ++++++++++++++++++++++++++++++++++++++- > git-bisect.sh | 34 +---------- > 2 files changed, 127 insertions(+), 34 deletions(-) > > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index 1854377fa6..92c783237d 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -31,6 +31,7 @@ static const char * const git_bisect_helper_usage[] = { > N_("git bisect--helper --bisect-auto-next"), > N_("git bisect--helper --bisect-state (bad|new) [<rev>]"), > N_("git bisect--helper --bisect-state (good|old) [<rev>...]"), > + N_("git bisect--helper --bisect-replay <filename>"), > NULL > }; > > @@ -916,6 +917,121 @@ static enum bisect_error bisect_log(void) > return status ? BISECT_FAILED : BISECT_OK; > } > > +static int get_next_word(const char *line, int pos, struct strbuf *word) > +{ > + int i, len = strlen(line), begin = 0; > + > + strbuf_reset(word); > + for (i = pos; i < len; i++) { > + if (line[i] == ' ' && begin) > + return i + 1; > + > + if (!begin) > + begin = 1; > + strbuf_addch(word, line[i]); > + } > + > + return i; > +} > + I would like to suggest a slight different approach to handling the "is the begin of the loop?" logic. If I understood correctly, the `begin` variable is to check whether is the beginning of the word processing and is changed to 1 (aka: to true) on the first loop interaction after the loop is executed for the first time. However, I believe we can check this information in different way that will simplify the logic by removing the "begin" and the "if (!begin)..." altogether. The for-loop is initialize with "i" set to "pos" which means that on the first execution the expression "i == pos" is going to be true, and "false" for the next interactions. Thus, checking if "i" is different, or better checking if "i" is greater should bring the same result. that said, the implementation might look like this: strbuf_reset(word); for (i = pos; i < len; i++) { if (line[i] == ' ' && i > pos) return i + 1; strbuf_addch(word, line[i]); } With additionally, removing the "begin" from the beginning of the function. The above was copied into the email without major tests, although I have this implementation locally just to ensure its compiles successfully. Please take this suggestion as you wish, I do not have any strong opinion on the current implementation. Also, I'm a recent contributor to the project and should not be much trusted when proposing changes as when proposal comes from project maintainer and/or Senior contributors ;). > +static int process_line(struct bisect_terms *terms, struct strbuf *line, struct strbuf *word) > +{ > + int res = 0; > + int pos = 0; > + > + while (pos < line->len) { > + pos = get_next_word(line->buf, pos, word); > + > + if (!strcmp(word->buf, "git")) > + continue; > + else if (!strcmp(word->buf, "git-bisect")) > + continue; > + else if (!strcmp(word->buf, "bisect")) > + continue; > + else if (starts_with(word->buf, "#")) > + break; > + > + get_terms(terms); > + if (check_and_set_terms(terms, word->buf)) > + return -1; > + > + if (!strcmp(word->buf, "start")) { > + struct strvec argv = STRVEC_INIT; > + int res; I believe this variable is already defined and initialize on the beginning of the function, right? If that is the case then the declaration seems duplicated here and can be avoided. > + sq_dequote_to_strvec(line->buf+pos, &argv); > + res = bisect_start(terms, argv.v, argv.nr); > + strvec_clear(&argv); > + if (res) > + return -1; Also, (see bellow) > + break; > + } > + > + if (one_of(word->buf, terms->term_good, > + terms->term_bad, "skip", NULL)) { > + if (bisect_write(word->buf, line->buf+pos, terms, 0)) > + return -1; > + break; > + } > + > + if (!strcmp(word->buf, "terms")) { > + struct strvec argv = STRVEC_INIT; > + int res; In case this supposed to be the same from the beginning of the function, the declaration seems duplicated here as well. > + sq_dequote_to_strvec(line->buf+pos, &argv); > + res = bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL); > + strvec_clear(&argv); > + if (res) > + return -1; > + break; > + } Another suggestion, again take as you wish, you can place the "if" directly on the call of the "bisect_start()" and set the "res = -1" as the value of "res" will be used for the function return anyways. Again with the intent of simplify the implementation. The code will look something like the similar: if (bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL)) res = -1; strvec_clear(&argv); break; I did not test the above code thoroughly though. > + error(_("Replay file contains rubbish (\"%s\")"), > + word->buf); I think this will be nicer on the same line ;). Not worth a re-roll > + res = -1; > + } > + return res; > +} > + > +static int process_replay_file(FILE *fp, struct bisect_terms *terms) > +{ > + struct strbuf line = STRBUF_INIT; > + struct strbuf word = STRBUF_INIT; > + int res = 0; > + > + while (strbuf_getline(&line, fp) != EOF) { > + res = process_line(terms, &line, &word); > + if (res) > + break; > + } > + > + strbuf_release(&line); > + strbuf_release(&word); > + return res; > +} > +
Hi Miriam, Miriam Rubio writes: > + > +static int process_replay_file(FILE *fp, struct bisect_terms *terms) > +{ > + struct strbuf line = STRBUF_INIT; > + struct strbuf word = STRBUF_INIT; > + int res = 0; > + > + while (strbuf_getline(&line, fp) != EOF) { > + res = process_line(terms, &line, &word); > + if (res) > + break; > + } > + I spotted another place where an optimization can be performed. The "if (res) .. break" conditional is only used to break the loop, which is the same job of the expression from the while-loop itself. Hence, as the purpose is to control the loop execution itself, checking the response of "process_line()" via the "res" value can be move to the loop expression itself and simplifying the code further, as shown on the following patch: -- >8 -- diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index b887413d8d..fb15587af8 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -988,11 +988,8 @@ static int process_replay_file(FILE *fp, struct bisect_terms *terms) struct strbuf word = STRBUF_INIT; int res = 0; - while (strbuf_getline(&line, fp) != EOF) { + while (!res && strbuf_getline(&line, fp) != EOF) res = process_line(terms, &line, &word); - if (res) - break; - } strbuf_release(&line); strbuf_release(&word); -- >8 -- This also seems to be similar approach from "one_of()" introduced by 4ba1e5c414 (bisect--helper: rewrite `check_term_format` shell function in C, 2017-09-29). Once more, the intention is to simplify the code and improve the code maintainability. > + strbuf_release(&line); > + strbuf_release(&word); > + return res; > +} > + I hope this helps increase the code quality. Happy Holidays
Rafael Silva <rafaeloliveira.cs@gmail.com> writes: > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index b887413d8d..fb15587af8 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -988,11 +988,8 @@ static int process_replay_file(FILE *fp, struct bisect_terms *terms) > struct strbuf word = STRBUF_INIT; > int res = 0; > > - while (strbuf_getline(&line, fp) != EOF) { > + while (!res && strbuf_getline(&line, fp) != EOF) > res = process_line(terms, &line, &word); > - if (res) > - break; > - } I do not mind shorter and crisper code, but I somehow find that the original more cleanly expresses the intent. "We'll grab input lines one by one until the input runs out" and "we stop when we see a line that process_line() likes" are conditions that the loop may stop at two logically distinct levels. You can conflate them into a single boolean, making it "unless we found a line the process_line() liked in the previous round, grab the next line but stop when we ran out the input", and it may make the result shorter, but it may be easier to follow by normal readers if we kept them separate, like the original does.
Junio C Hamano writes: > Rafael Silva <rafaeloliveira.cs@gmail.com> writes: > >> diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c >> index b887413d8d..fb15587af8 100644 >> --- a/builtin/bisect--helper.c >> +++ b/builtin/bisect--helper.c >> @@ -988,11 +988,8 @@ static int process_replay_file(FILE *fp, struct bisect_terms *terms) >> struct strbuf word = STRBUF_INIT; >> int res = 0; >> >> - while (strbuf_getline(&line, fp) != EOF) { >> + while (!res && strbuf_getline(&line, fp) != EOF) >> res = process_line(terms, &line, &word); >> - if (res) >> - break; >> - } > > I do not mind shorter and crisper code, but I somehow find that the > original more cleanly expresses the intent. > > "We'll grab input lines one by one until the input runs out" and "we > stop when we see a line that process_line() likes" are conditions > that the loop may stop at two logically distinct levels. You can > conflate them into a single boolean, making it "unless we found a > line the process_line() liked in the previous round, grab the next > line but stop when we ran out the input", and it may make the result > shorter, but it may be easier to follow by normal readers if we kept > them separate, like the original does. That's a good point (and nice explanation, by the way). Before I was thinking more on the line "while we do not found a good line from process_line() and we do not finish processing the file, let's go to the next line" which lead me to proposed changes for shorten the code. However, after your explanation, I can see now and agree the original does seems easier to follow and we can as it is.
Rafael Silva <rafaeloliveira.cs@gmail.com> writes: > That's a good point (and nice explanation, by the way). Before I was > thinking more on the line "while we do not found a good line from > process_line() and we do not finish processing the file, let's go to > the next line" which lead me to proposed changes for shorten the code. > > However, after your explanation, I can see now and agree the original > does seems easier to follow and we can as it is. Well, it is very possible if you come with your version of a similar loop in three months in a *new* codepath, I may say that it is a good way to write it. As I said, I do not mind shorter and crisper code. What I am saying is that I may have preference but it is not so strong one in this case, and certainly it is not strong enough to suggest rewriting one way to the other when the initial variant in the patch (which may be either one) is good enough. Thanks.
Hi Miriam, this patch looks pretty good to me. I just have a couple comments/suggestions: On Mon, 21 Dec 2020, Miriam Rubio wrote: > diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c > index 1854377fa6..92c783237d 100644 > --- a/builtin/bisect--helper.c > +++ b/builtin/bisect--helper.c > @@ -916,6 +917,121 @@ static enum bisect_error bisect_log(void) > return status ? BISECT_FAILED : BISECT_OK; > } > > +static int get_next_word(const char *line, int pos, struct strbuf *word) > +{ > + int i, len = strlen(line), begin = 0; > + > + strbuf_reset(word); > + for (i = pos; i < len; i++) { > + if (line[i] == ' ' && begin) > + return i + 1; > + > + if (!begin) > + begin = 1; > + strbuf_addch(word, line[i]); > + } > + > + return i; > +} While I have different objections than Rafael (the function seems to want to left-trim, but does an inadequate job at it), I do not even think that we need this function. See below. > + > +static int process_line(struct bisect_terms *terms, struct strbuf *line, struct strbuf *word) > +{ > + int res = 0; > + int pos = 0; > + > + while (pos < line->len) { > + pos = get_next_word(line->buf, pos, word); > + > + if (!strcmp(word->buf, "git")) > + continue; > + else if (!strcmp(word->buf, "git-bisect")) > + continue; > + else if (!strcmp(word->buf, "bisect")) > + continue; This is not quite correct, as it would skip arbitrary amounts of "git" and "git-bisect" and "bisect", even in the middle of the line. Besides, this `while()` loop iterates over the characters of the current line, while the original loop iterated over the _lines_: while read git bisect command rev do test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue [...] As you can see, lines that do not start with "git bisect" or "git-bisect" are simply ignored by `continue`ing to the next line. In the C code, `continue` would continue to the next _word_. I'd like to strongly suggest a very different approach, where no `while()` loop is used in `process_line()` (BTW can we please rename this to `process_replay_line()` to 1) make it less generic a name and 2) convey via the name what this function is about?): const char *p; if (!skip_prefix(line->buf, "git bisect", &p) && !skip_prefix(line->buf, "git bisect", p)) return 0; As the original code used `read git bisect command rev` to parse the line, which automatically trims white-space, we might want to do something similar to that: const char *p = line->buf + strspn(line->buf, " \t"); if ((!skip_prefix(p, "git bisect", &p) && !skip_prefix(p, "git-bisect", &p)) || !isspace(*p)) return 0; p += strspn(p, " \t"); > + else if (starts_with(word->buf, "#")) > + break; Please note that the original shell code is _a lot_ more forgiving: it ignores _anything_ but "git bisect" and "git-bisect" at the beginning of the (white-space-trimmed) line. Not just comments starting with `#`. I'd like to return to the shell script's behavior. > + > + get_terms(terms); Do we really have to read the terms for every line we replay? I guess we do, as every time we use new/old after good/bad (or vice versa), the terms file gets rewritten. We might want to address this awkwardness in the future: in C, we do not have to read and write the terms file _all_ the time. > + if (check_and_set_terms(terms, word->buf)) > + return -1; > + > + if (!strcmp(word->buf, "start")) { Let's use `skip_prefix()` here, too: char *word_end = p + strcspn(p, " \t"); const char *rev = word_end + strspn(word_end, " \t"); *word_end = '\0'; /* NUL-terminate the word */ if (!strcmp("start", p)) { struct strvec argv = STRVEC_INIT; int res; sq_dequote_to_strvec(rev, &argv); res = bisect_start(terms, argv.v, argv.nr); strvec_clear(&argv); return res; } > + struct strvec argv = STRVEC_INIT; > + int res; > + sq_dequote_to_strvec(line->buf+pos, &argv); > + res = bisect_start(terms, argv.v, argv.nr); > + strvec_clear(&argv); > + if (res) > + return -1; > + break; > + } > + > + if (one_of(word->buf, terms->term_good, > + terms->term_bad, "skip", NULL)) { > + if (bisect_write(word->buf, line->buf+pos, terms, 0)) > + return -1; Apart from using `p` as above instead of `word->buf`, and `rev` instead of `line->buf+pos`, why not returning directly what `bisect_write()` returned? if (one_of(p, terms->term_good, terms->term_bad, "skip", NULL)) return bisect_write(p, rev, terms, 0); > + break; > + } > + > + if (!strcmp(word->buf, "terms")) { > + struct strvec argv = STRVEC_INIT; > + int res; > + sq_dequote_to_strvec(line->buf+pos, &argv); > + res = bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL); We should probably error out if `argv.nr > 1`. > + strvec_clear(&argv); > + if (res) > + return -1; > + break; Let's `return res` directly. > + } > + > + error(_("Replay file contains rubbish (\"%s\")"), > + word->buf); The shell script version does this instead: die "$(gettext "?? what are you talking about?")" ;; We should do the same, if only to make life easier on the translators. > + res = -1; > + } > + return res; > +} > + > +static int process_replay_file(FILE *fp, struct bisect_terms *terms) > +{ > + struct strbuf line = STRBUF_INIT; > + struct strbuf word = STRBUF_INIT; > + int res = 0; > + > + while (strbuf_getline(&line, fp) != EOF) { > + res = process_line(terms, &line, &word); > + if (res) > + break; > + } > + > + strbuf_release(&line); > + strbuf_release(&word); > + return res; > +} A lot of this function is just boiler plate. I'd prefer to merge the code into `bisect_replay()` instead. > + > +static enum bisect_error bisect_replay(struct bisect_terms *terms, const char *filename) > +{ > + FILE *fp = NULL; > + enum bisect_error res = BISECT_OK; > + > + if (is_empty_or_missing_file(filename)) > + return error(_("cannot read file '%s' for replaying"), filename); > + > + if (bisect_reset(NULL)) > + return BISECT_FAILED; > + > + fp = fopen(filename, "r"); > + if (!fp) > + return BISECT_FAILED; > + > + res = process_replay_file(fp, terms); > + fclose(fp); > + > + if (res) > + return BISECT_FAILED; > + > + return bisect_auto_next(terms, NULL); > +} The rest looks good to me! Ciao, Dscho
diff --git a/builtin/bisect--helper.c b/builtin/bisect--helper.c index 1854377fa6..92c783237d 100644 --- a/builtin/bisect--helper.c +++ b/builtin/bisect--helper.c @@ -31,6 +31,7 @@ static const char * const git_bisect_helper_usage[] = { N_("git bisect--helper --bisect-auto-next"), N_("git bisect--helper --bisect-state (bad|new) [<rev>]"), N_("git bisect--helper --bisect-state (good|old) [<rev>...]"), + N_("git bisect--helper --bisect-replay <filename>"), NULL }; @@ -916,6 +917,121 @@ static enum bisect_error bisect_log(void) return status ? BISECT_FAILED : BISECT_OK; } +static int get_next_word(const char *line, int pos, struct strbuf *word) +{ + int i, len = strlen(line), begin = 0; + + strbuf_reset(word); + for (i = pos; i < len; i++) { + if (line[i] == ' ' && begin) + return i + 1; + + if (!begin) + begin = 1; + strbuf_addch(word, line[i]); + } + + return i; +} + +static int process_line(struct bisect_terms *terms, struct strbuf *line, struct strbuf *word) +{ + int res = 0; + int pos = 0; + + while (pos < line->len) { + pos = get_next_word(line->buf, pos, word); + + if (!strcmp(word->buf, "git")) + continue; + else if (!strcmp(word->buf, "git-bisect")) + continue; + else if (!strcmp(word->buf, "bisect")) + continue; + else if (starts_with(word->buf, "#")) + break; + + get_terms(terms); + if (check_and_set_terms(terms, word->buf)) + return -1; + + if (!strcmp(word->buf, "start")) { + struct strvec argv = STRVEC_INIT; + int res; + sq_dequote_to_strvec(line->buf+pos, &argv); + res = bisect_start(terms, argv.v, argv.nr); + strvec_clear(&argv); + if (res) + return -1; + break; + } + + if (one_of(word->buf, terms->term_good, + terms->term_bad, "skip", NULL)) { + if (bisect_write(word->buf, line->buf+pos, terms, 0)) + return -1; + break; + } + + if (!strcmp(word->buf, "terms")) { + struct strvec argv = STRVEC_INIT; + int res; + sq_dequote_to_strvec(line->buf+pos, &argv); + res = bisect_terms(terms, argv.nr == 1 ? argv.v[0] : NULL); + strvec_clear(&argv); + if (res) + return -1; + break; + } + + error(_("Replay file contains rubbish (\"%s\")"), + word->buf); + res = -1; + } + return res; +} + +static int process_replay_file(FILE *fp, struct bisect_terms *terms) +{ + struct strbuf line = STRBUF_INIT; + struct strbuf word = STRBUF_INIT; + int res = 0; + + while (strbuf_getline(&line, fp) != EOF) { + res = process_line(terms, &line, &word); + if (res) + break; + } + + strbuf_release(&line); + strbuf_release(&word); + return res; +} + +static enum bisect_error bisect_replay(struct bisect_terms *terms, const char *filename) +{ + FILE *fp = NULL; + enum bisect_error res = BISECT_OK; + + if (is_empty_or_missing_file(filename)) + return error(_("cannot read file '%s' for replaying"), filename); + + if (bisect_reset(NULL)) + return BISECT_FAILED; + + fp = fopen(filename, "r"); + if (!fp) + return BISECT_FAILED; + + res = process_replay_file(fp, terms); + fclose(fp); + + if (res) + return BISECT_FAILED; + + return bisect_auto_next(terms, NULL); +} + int cmd_bisect__helper(int argc, const char **argv, const char *prefix) { enum { @@ -929,7 +1045,8 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix) BISECT_NEXT, BISECT_AUTO_NEXT, BISECT_STATE, - BISECT_LOG + BISECT_LOG, + BISECT_REPLAY } cmdmode = 0; int res = 0, nolog = 0; struct option options[] = { @@ -953,6 +1070,8 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix) N_("mark the state of ref (or refs)"), BISECT_STATE), OPT_CMDMODE(0, "bisect-log", &cmdmode, N_("output the contents of BISECT_LOG"), BISECT_LOG), + OPT_CMDMODE(0, "bisect-replay", &cmdmode, + N_("replay the bisection process from the given file"), BISECT_REPLAY), OPT_BOOL(0, "no-log", &nolog, N_("no log for BISECT_WRITE")), OPT_END() @@ -1020,6 +1139,12 @@ int cmd_bisect__helper(int argc, const char **argv, const char *prefix) return error(_("--bisect-log requires 0 arguments")); res = bisect_log(); break; + case BISECT_REPLAY: + if (argc != 1) + return error(_("no logfile given")); + set_terms(&terms, "bad", "good"); + res = bisect_replay(&terms, argv[0]); + break; default: BUG("unknown subcommand %d", cmdmode); } diff --git a/git-bisect.sh b/git-bisect.sh index c6149846ff..79bcd31bd7 100755 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -77,38 +77,6 @@ bisect_visualize() { eval '"$@"' --bisect -- $(cat "$GIT_DIR/BISECT_NAMES") } -bisect_replay () { - file="$1" - test "$#" -eq 1 || die "$(gettext "No logfile given")" - test -r "$file" || die "$(eval_gettext "cannot read \$file for replaying")" - git bisect--helper --bisect-reset || exit - oIFS="$IFS" IFS="$IFS$(printf '\015')" - while read git bisect command rev tail - do - test "$git $bisect" = "git bisect" || test "$git" = "git-bisect" || continue - if test "$git" = "git-bisect" - then - rev="$command" - command="$bisect" - fi - get_terms - git bisect--helper --check-and-set-terms "$command" "$TERM_GOOD" "$TERM_BAD" || exit - get_terms - case "$command" in - start) - eval "git bisect--helper --bisect-start $rev $tail" ;; - "$TERM_GOOD"|"$TERM_BAD"|skip) - git bisect--helper --bisect-write "$command" "$rev" "$TERM_GOOD" "$TERM_BAD" || exit;; - terms) - git bisect--helper --bisect-terms $rev || exit;; - *) - die "$(gettext "?? what are you talking about?")" ;; - esac - done <"$file" - IFS="$oIFS" - git bisect--helper --bisect-auto-next || exit -} - bisect_run () { git bisect--helper --bisect-next-check $TERM_GOOD $TERM_BAD fail || exit @@ -203,7 +171,7 @@ case "$#" in reset) git bisect--helper --bisect-reset "$@" ;; replay) - bisect_replay "$@" ;; + git bisect--helper --bisect-replay "$@" || exit;; log) git bisect--helper --bisect-log || exit;; run)