Message ID | xmqq7d7bsu2n.fsf@gitster.g (mailing list archive) |
---|---|
State | Accepted |
Commit | f8781bfda31756acdc0ae77da7e70337aedae7c9 |
Headers | show |
Series | 2.36 gitk/diff-tree --stdin regression fix | expand |
Am 26.04.22 um 18:11 schrieb Junio C Hamano: > This only surfaced as a regression after 2.36 release, but the > breakage was already there with us for at least a year. > > The diff_free() call is to be used after we completely finished with > a diffopt structure. After "git diff A B" finishes producing > output, calling it before process exit is fine. But there are > commands that prepares diff_options struct once, compares two sets > of paths, releases resources that were used to do the comparison, > then reuses the same diff_option struct to go on to compare the next > two sets of paths, like "git log -p". > > After "git log -p" finishes showing a single commit, calling it > before it goes on to the next commit is NOT fine. There is a > mechanism, the .no_free member in diff_options struct, to help "git > log" to avoid calling diff_free() after showing each commit and > instead call it just one. When the mechanism was introduced in > e900d494 (diff: add an API for deferred freeing, 2021-02-11), > however, we forgot to do the same to "diff-tree --stdin", which *is* > a moral equivalent to "git log". > > During 2.36 release cycle, we started clearing the pathspec in > diff_free(), so programs like gitk that runs > > git diff-tree --stdin -- <pathspec> > > downstream of a pipe, processing one commit after another, started > showing irrelevant comparison outside the given <pathspec> from the > second commit. The same commit, by forgetting to teach the .no_free > mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed > it for over a year, presumably because it is so seldom used an > option. > > But <pathspec> is a different story. The breakage was very > prominently visible and was reported immediately after 2.36 was > released. > > Fix this breakage by mimicking how "git log" utilizes the .no_free > member so that "diff-tree --stdin" behaves more similarly to "log". > > Protect the fix with a few new tests. We could check where reused diffopts caused a pathspec loss at runtime, like in the patch below. Then we "just" need to get the relevant test coverage to 100% and we'll find them all. With your patch on top of main, "make test" passes for me. With the patch below added as well I get failures in three test scripts: t3427-rebase-subtree.sh (Wstat: 256 Tests: 3 Failed: 2) Failed tests: 2-3 Non-zero exit status: 1 t4014-format-patch.sh (Wstat: 256 Tests: 190 Failed: 1) Failed test: 73 Non-zero exit status: 1 t9350-fast-export.sh (Wstat: 256 Tests: 50 Failed: 3) Failed tests: 30, 32, 43 Non-zero exit status: 1 The format-patch is a bit surprising to me because it already sets no_free conditionally. t4014 is successful if no_free is set in all cases, so the condition seems to be too narrow -- but I don't understand it. Didn't look at the other cases. --- diff.c | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/diff.c b/diff.c index ef7159968b..b7c837aca8 100644 --- a/diff.c +++ b/diff.c @@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options) void diff_free(struct diff_options *options) { + static struct diff_options *prev_options_with_pathspec; + if (options->no_free) return; + if (prev_options_with_pathspec == options && !options->pathspec.nr) + BUG("reused struct diff_options, potentially lost pathspec"); + if (options->pathspec.nr) + prev_options_with_pathspec = options; + diff_free_file(options); diff_free_ignore_regex(options); clear_pathspec(&options->pathspec); -- 2.35.3
Am 27.04.22 um 18:42 schrieb René Scharfe: > Am 26.04.22 um 18:11 schrieb Junio C Hamano: >> This only surfaced as a regression after 2.36 release, but the >> breakage was already there with us for at least a year. >> >> The diff_free() call is to be used after we completely finished with >> a diffopt structure. After "git diff A B" finishes producing >> output, calling it before process exit is fine. But there are >> commands that prepares diff_options struct once, compares two sets >> of paths, releases resources that were used to do the comparison, >> then reuses the same diff_option struct to go on to compare the next >> two sets of paths, like "git log -p". >> >> After "git log -p" finishes showing a single commit, calling it >> before it goes on to the next commit is NOT fine. There is a >> mechanism, the .no_free member in diff_options struct, to help "git >> log" to avoid calling diff_free() after showing each commit and >> instead call it just one. When the mechanism was introduced in >> e900d494 (diff: add an API for deferred freeing, 2021-02-11), >> however, we forgot to do the same to "diff-tree --stdin", which *is* >> a moral equivalent to "git log". >> >> During 2.36 release cycle, we started clearing the pathspec in >> diff_free(), so programs like gitk that runs >> >> git diff-tree --stdin -- <pathspec> >> >> downstream of a pipe, processing one commit after another, started >> showing irrelevant comparison outside the given <pathspec> from the >> second commit. The same commit, by forgetting to teach the .no_free >> mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed >> it for over a year, presumably because it is so seldom used an >> option. >> >> But <pathspec> is a different story. The breakage was very >> prominently visible and was reported immediately after 2.36 was >> released. >> >> Fix this breakage by mimicking how "git log" utilizes the .no_free >> member so that "diff-tree --stdin" behaves more similarly to "log". >> >> Protect the fix with a few new tests. > > We could check where reused diffopts caused a pathspec loss at runtime, > like in the patch below. Then we "just" need to get the relevant test > coverage to 100% and we'll find them all. > > With your patch on top of main, "make test" passes for me. With the > patch below added as well I get failures in three test scripts: > > t3427-rebase-subtree.sh (Wstat: 256 Tests: 3 Failed: 2) > Failed tests: 2-3 > Non-zero exit status: 1 > t4014-format-patch.sh (Wstat: 256 Tests: 190 Failed: 1) > Failed test: 73 > Non-zero exit status: 1 > t9350-fast-export.sh (Wstat: 256 Tests: 50 Failed: 3) > Failed tests: 30, 32, 43 > Non-zero exit status: 1 > > The format-patch is a bit surprising to me because it already sets > no_free conditionally. t4014 is successful if no_free is set in all > cases, so the condition seems to be too narrow -- but I don't understand > it. Didn't look at the other cases. > > --- > diff.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/diff.c b/diff.c > index ef7159968b..b7c837aca8 100644 > --- a/diff.c > +++ b/diff.c > @@ -6455,9 +6455,16 @@ static void diff_free_ignore_regex(struct diff_options *options) > > void diff_free(struct diff_options *options) > { > + static struct diff_options *prev_options_with_pathspec; > + > if (options->no_free) > return; > > + if (prev_options_with_pathspec == options && !options->pathspec.nr) > + BUG("reused struct diff_options, potentially lost pathspec"); > + if (options->pathspec.nr) > + prev_options_with_pathspec = options; This can report a false positive if a diffopt is reused with different pathspecs, and one of them is empty (match all). Which could be countered by using a fresh diffopt every time (e.g. pushing it into a loop). > + > diff_free_file(options); > diff_free_ignore_regex(options); > clear_pathspec(&options->pathspec);
René Scharfe <l.s.r@web.de> writes: >> + if (prev_options_with_pathspec == options && !options->pathspec.nr) >> + BUG("reused struct diff_options, potentially lost pathspec"); >> + if (options->pathspec.nr) >> + prev_options_with_pathspec = options; > > This can report a false positive if a diffopt is reused with different > pathspecs, and one of them is empty (match all). Which could be countered > by using a fresh diffopt every time (e.g. pushing it into a loop). The only use case to reset pathspec of a diffopt during iteration I can think of is the hacky[*] version of "git log --follow" where the pathspec is swapped when a rename of a single path being followed is detected. Side note: hacky because the way it swaps a single pathspec upon seeing one rename means it does not work in a mergy-branchy history where one branch renames and there are still commits that need to be explored on the other branch that had the path under its original name.
diff --git a/builtin/diff-tree.c b/builtin/diff-tree.c index 0e0ac1f167..116097a404 100644 --- a/builtin/diff-tree.c +++ b/builtin/diff-tree.c @@ -195,6 +195,7 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix) int saved_dcctc = 0; opt->diffopt.rotate_to_strict = 0; + opt->diffopt.no_free = 1; if (opt->diffopt.detect_rename) { if (!the_index.cache) repo_read_index(the_repository); @@ -217,6 +218,8 @@ int cmd_diff_tree(int argc, const char **argv, const char *prefix) } opt->diffopt.degraded_cc_to_c = saved_dcctc; opt->diffopt.needed_rename_limit = saved_nrl; + opt->diffopt.no_free = 0; + diff_free(&opt->diffopt); } return diff_result_code(&opt->diffopt, 0); diff --git a/log-tree.c b/log-tree.c index 25165e2a91..f8c18fd8b9 100644 --- a/log-tree.c +++ b/log-tree.c @@ -1098,6 +1098,7 @@ int log_tree_commit(struct rev_info *opt, struct commit *commit) opt->loginfo = &log; opt->diffopt.no_free = 1; + /* NEEDSWORK: no restoring of no_free? Why? */ if (opt->line_level_traverse) return line_log_print(opt, commit); diff --git a/t/t4013-diff-various.sh b/t/t4013-diff-various.sh index 750aee17ea..628b01f355 100755 --- a/t/t4013-diff-various.sh +++ b/t/t4013-diff-various.sh @@ -542,6 +542,20 @@ test_expect_success 'diff-tree --stdin with log formatting' ' test_cmp expect actual ' +test_expect_success 'diff-tree --stdin with pathspec' ' + cat >expect <<-EOF && + Third + + dir/sub + Second + + dir/sub + EOF + git rev-list master^ | + git diff-tree -r --stdin --name-only --format=%s dir >actual && + test_cmp expect actual +' + test_expect_success 'diff -I<regex>: setup' ' git checkout master && test_seq 50 >file0 &&
This only surfaced as a regression after 2.36 release, but the breakage was already there with us for at least a year. The diff_free() call is to be used after we completely finished with a diffopt structure. After "git diff A B" finishes producing output, calling it before process exit is fine. But there are commands that prepares diff_options struct once, compares two sets of paths, releases resources that were used to do the comparison, then reuses the same diff_option struct to go on to compare the next two sets of paths, like "git log -p". After "git log -p" finishes showing a single commit, calling it before it goes on to the next commit is NOT fine. There is a mechanism, the .no_free member in diff_options struct, to help "git log" to avoid calling diff_free() after showing each commit and instead call it just one. When the mechanism was introduced in e900d494 (diff: add an API for deferred freeing, 2021-02-11), however, we forgot to do the same to "diff-tree --stdin", which *is* a moral equivalent to "git log". During 2.36 release cycle, we started clearing the pathspec in diff_free(), so programs like gitk that runs git diff-tree --stdin -- <pathspec> downstream of a pipe, processing one commit after another, started showing irrelevant comparison outside the given <pathspec> from the second commit. The same commit, by forgetting to teach the .no_free mechanism, broke "diff-tree --stdin -I<regexp>" and nobody noticed it for over a year, presumably because it is so seldom used an option. But <pathspec> is a different story. The breakage was very prominently visible and was reported immediately after 2.36 was released. Fix this breakage by mimicking how "git log" utilizes the .no_free member so that "diff-tree --stdin" behaves more similarly to "log". Protect the fix with a few new tests. Reported-by: Matthias Aßhauer <mha1993@live.de> Helped-by: René Scharfe <l.s.r@web.de> Helped-by: Phillip Wood <phillip.wood123@gmail.com> Signed-off-by: Junio C Hamano <gitster@pobox.com> --- * I feel MUCH better with this than the revert, now Phillip helped me to get the root cause straight. Addition of clear_pathspec() to diff_tree() was *not* a mistake but is quite reasonable thing to do. Not using the .no_free hack in a code path that needed it was. builtin/diff-tree.c | 3 +++ log-tree.c | 1 + t/t4013-diff-various.sh | 14 ++++++++++++++ 3 files changed, 18 insertions(+)