Message ID | pull.739.git.1601650736489.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | b736873ada227bb222b62eceb6cf60c0f5476b64 |
Headers | show |
Series | commit-reach: fix in_merge_bases_many bug | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > From: Derrick Stolee <dstolee@microsoft.com> > > Way back in f9b8908b (commit.c: use generation numbers for > in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit > the in_merge_bases() walk. This works just fine as long as the > caller is checking only two commits, but when there are multiple, > there is a possibility that this heuristic is _very wrong_. Thanks. That bug defeats the point of _many() part of the function. > Some code moves since then has changed this method to > repo_in_merge_bases_many() inside commit-reach.c. The heuristic > computes the minimum generation number of the "reference" list, then > compares this number to the generation number of the "commit". > > In a recent topic, a test was added that used in_merge_bases_many() > to test if a commit was reachable from a number of commits pulled > from a reflog. However, this highlighted the problem: if any of the > reference commits have a smaller generation number than the given > commit, then the walk is skipped _even if there exist some with > higher generation number_. > > This heuristic is wrong! It must check the MAXIMUM generation number > of the reference commits, not the MINIMUM. Correct > Fix in_merge_bases_many() with commit-graphs > > Johannes alerted me to the difficulties Srinidhi was having with > in_merge_bases_many() and commit-graphs. Sorry that I hadn't seen that > thread and the issues therein. > > After working with Johannes to investigate what was happening, we found > a 2-year-old bug in the generation number checks! > > Thanks, -Stolee Good. Thanks.
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > The fix itself is to swap min_generation with a max_generation in > repo_in_merge_bases_many(). > > Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> > Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> The order of these looked iffy so I rearranged them in chrono order, Srinidhi's series found it broken, and with Dscho's help, fix was produced and you sent it with your sign off. > commit-reach.c | 8 ++++---- > t/helper/test-reach.c | 2 ++ > t/t6600-test-reach.sh | 30 ++++++++++++++++++++++++++++++ > 3 files changed, 36 insertions(+), 4 deletions(-) I've applied this and then rebased Srinidhi's latest on top, with the following to re-enable the commit-graph in the codepath. t5533 and t6600 passes and when the fix in the message I am responding to is reverted, t5533 again fails. Thanks. SQUASH??? drop "hide the breakage under the rug" hack Derrick and Dscho spotted and fixed the incorrect optimization in in_merge_bases_many() when the commit-graph is in use, so there is no longer a reason why we would want to disable commit-graph locally in this codepath. diff --git a/remote.c b/remote.c index 98a578f5dc..8c91188677 100644 --- a/remote.c +++ b/remote.c @@ -2405,34 +2405,18 @@ static int is_reachable_in_reflog(const char *local, const struct ref *remote) return ret; } -/* Toggle the "commit-graph" feature; return the previously set state. */ -static int toggle_commit_graph(struct repository *repo, int disable) { - int prev = repo->commit_graph_disabled; - repo->commit_graph_disabled = disable; - return prev; -} - /* * Check for reachability of a remote-tracking * ref in the reflog entries of its local ref. */ static void check_if_includes_upstream(struct ref *remote) { - int prev; struct ref *local = get_local_ref(remote->name); if (!local) return; - /* - * TODO: Remove "toggle_commit_graph()" calls around the check. - * Depending on whether "commit-graph" enabled or not, - * "in_merge_bases_many()" returns different results; - * disable it temporarily when the check runs. - */ - prev = toggle_commit_graph(the_repository, 1); if (is_reachable_in_reflog(local->name, remote) <= 0) remote->unreachable = 1; - toggle_commit_graph(the_repository, prev); } static void apply_cas(struct push_cas_option *cas,
On 10/2/2020 2:51 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> The fix itself is to swap min_generation with a max_generation in >> repo_in_merge_bases_many(). >> >> Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> >> Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> >> Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > > The order of these looked iffy so I rearranged them in chrono order, > Srinidhi's series found it broken, and with Dscho's help, fix was > produced and you sent it with your sign off. Sure. I can agree with that chronological order and will keep that in mind for next time. >> commit-reach.c | 8 ++++---- >> t/helper/test-reach.c | 2 ++ >> t/t6600-test-reach.sh | 30 ++++++++++++++++++++++++++++++ >> 3 files changed, 36 insertions(+), 4 deletions(-) > > I've applied this and then rebased Srinidhi's latest on top, with > the following to re-enable the commit-graph in the codepath. t5533 > and t6600 passes and when the fix in the message I am responding to > is reverted, t5533 again fails. > > Thanks. Thanks for double-checking. I also think that dropping the "hide the error" patch is prudent. Thanks, -Stolee
Hello, On 10/02/2020 14:58, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <dstolee@microsoft.com> > > Way back in f9b8908b (commit.c: use generation numbers for > in_merge_bases(), 2018-05-01), a heuristic was used to short-circuit > the in_merge_bases() walk. This works just fine as long as the > caller is checking only two commits, but when there are multiple, > there is a possibility that this heuristic is _very wrong_. > > Some code moves since then has changed this method to > repo_in_merge_bases_many() inside commit-reach.c. The heuristic > computes the minimum generation number of the "reference" list, then > compares this number to the generation number of the "commit". > > In a recent topic, a test was added that used in_merge_bases_many() > to test if a commit was reachable from a number of commits pulled > from a reflog. However, this highlighted the problem: if any of the > reference commits have a smaller generation number than the given > commit, then the walk is skipped _even if there exist some with > higher generation number_. > > This heuristic is wrong! It must check the MAXIMUM generation number > of the reference commits, not the MINIMUM. > > This highlights a testing gap. t6600-test-reach.sh covers many > methods in commit-reach.c, including in_merge_bases() and > get_merge_bases_many(), but since these methods either restrict to > two input commits or actually look for the full list of merge bases, > they don't check this heuristic! > > Add a possible input to "test-tool reach" that tests > in_merge_bases_many() and add tests to t6600-test-reach.sh that > cover this heuristic. This includes cases for the reference commits > having generation above and below the generation of the input commit, > but also having maximum generation below the generation of the input > commit. > > The fix itself is to swap min_generation with a max_generation in > repo_in_merge_bases_many(). > > Helped-by: Johannes Schindelin <johannes.schindelin@gmx.de> > Reported-by: Srinidhi Kaushik <shrinidhi.kaushik@gmail.com> > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > --- > Fix in_merge_bases_many() with commit-graphs > > Johannes alerted me to the difficulties Srinidhi was having with > in_merge_bases_many() and commit-graphs. Sorry that I hadn't seen that > thread and the issues therein. > > After working with Johannes to investigate what was happening, we found > a 2-year-old bug in the generation number checks! > > Thanks, -Stolee > > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-739%2Fderrickstolee%2Fin-merge-bases-many-fix-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-739/derrickstolee/in-merge-bases-many-fix-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/739 > > commit-reach.c | 8 ++++---- > t/helper/test-reach.c | 2 ++ > t/t6600-test-reach.sh | 30 ++++++++++++++++++++++++++++++ > 3 files changed, 36 insertions(+), 4 deletions(-) > > diff --git a/commit-reach.c b/commit-reach.c > index efd5925cbb..50175b159e 100644 > --- a/commit-reach.c > +++ b/commit-reach.c > @@ -321,7 +321,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, > { > struct commit_list *bases; > int ret = 0, i; > - uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY; > + uint32_t generation, max_generation = GENERATION_NUMBER_ZERO; > > if (repo_parse_commit(r, commit)) > return ret; > @@ -330,12 +330,12 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, > return ret; > > generation = commit_graph_generation(reference[i]); > - if (generation < min_generation) > - min_generation = generation; > + if (generation > max_generation) > + max_generation = generation; > } > > generation = commit_graph_generation(commit); > - if (generation > min_generation) > + if (generation > max_generation) > return ret; This is correct; good catch. > bases = paint_down_to_common(r, commit, > diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c > index 14a3655442..cda804ed79 100644 > --- a/t/helper/test-reach.c > +++ b/t/helper/test-reach.c > @@ -107,6 +107,8 @@ int cmd__reach(int ac, const char **av) > printf("%s(A,B):%d\n", av[1], ref_newer(&oid_A, &oid_B)); > else if (!strcmp(av[1], "in_merge_bases")) > printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B)); > + else if (!strcmp(av[1], "in_merge_bases_many")) > + printf("%s(A,X):%d\n", av[1], in_merge_bases_many(A, X_nr, X_array)); > else if (!strcmp(av[1], "is_descendant_of")) > printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X)); > else if (!strcmp(av[1], "get_merge_bases_many")) { > diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh > index 475564bee7..f807276337 100755 > --- a/t/t6600-test-reach.sh > +++ b/t/t6600-test-reach.sh > @@ -110,6 +110,36 @@ test_expect_success 'in_merge_bases:miss' ' > test_three_modes in_merge_bases > ' > > +test_expect_success 'in_merge_bases_many:hit' ' > + cat >input <<-\EOF && > + A:commit-6-8 > + X:commit-6-9 > + X:commit-5-7 > + EOF > + echo "in_merge_bases_many(A,X):1" >expect && > + test_three_modes in_merge_bases_many > +' > + > +test_expect_success 'in_merge_bases_many:miss' ' > + cat >input <<-\EOF && > + A:commit-6-8 > + X:commit-7-7 > + X:commit-8-6 > + EOF > + echo "in_merge_bases_many(A,X):0" >expect && > + test_three_modes in_merge_bases_many > +' > + > +test_expect_success 'in_merge_bases_many:miss-heuristic' ' > + cat >input <<-\EOF && > + A:commit-6-8 > + X:commit-7-5 > + X:commit-6-6 > + EOF > + echo "in_merge_bases_many(A,X):0" >expect && > + test_three_modes in_merge_bases_many > +' > + > test_expect_success 'is_descendant_of:hit' ' > cat >input <<-\EOF && > A:commit-5-7 > > base-commit: 47ae905ffb98cc4d4fd90083da6bc8dab55d9ecc > -- > gitgitgadget Thanks Derrick and Johannes, for identifying the problem and fixing the bug.
Derrick Stolee <stolee@gmail.com> writes: > Thanks for double-checking. I also think that dropping the > "hide the error" patch is prudent. Thanks again for a quick and straight-forward fix. As I mentioned elsewhere in the thread, it appears that we invented duplicate API with parallel implementation in get_reachable_subset(), which seems to be a strict superset of in_merge_bases_many(), and that may be what led to an initial and incorrect "get_reachable_subset() is not broken the same way as in_merge_bases_many() so use it instead" response. I haven't paid attention to the quality of implementation of get_reachable_subset() as much as in_merge_bases_many() (e.g. I know of an obvious way to optimize the latter) so far, but it would be wonderful if we can eventually rewrite in_merge_bases_many() as a very thin wrapper() around get_reachable_subset() without any downside.
On 10/2/2020 4:03 PM, Junio C Hamano wrote: > Derrick Stolee <stolee@gmail.com> writes: > >> Thanks for double-checking. I also think that dropping the >> "hide the error" patch is prudent. > > Thanks again for a quick and straight-forward fix. As I mentioned > elsewhere in the thread, it appears that we invented duplicate API > with parallel implementation in get_reachable_subset(), which seems > to be a strict superset of in_merge_bases_many(), and that may be > what led to an initial and incorrect "get_reachable_subset() is not > broken the same way as in_merge_bases_many() so use it instead" > response. I haven't paid attention to the quality of implementation > of get_reachable_subset() as much as in_merge_bases_many() (e.g. I > know of an obvious way to optimize the latter) so far, but it would > be wonderful if we can eventually rewrite in_merge_bases_many() as a > very thin wrapper() around get_reachable_subset() without any > downside. I have self-assigned myself https://github.com/gitgitgadget/git/issues/740 which will investigate these duplicates and see what can be done. I know that the initial refactoring focused on keeping the "public" interface the same and just creating shims between different method prototypes. In this case, it might be worth doing a full replacement since in_merge_bases_many() has so few callers. I'll be proactive to look for other cases that might be tricky for new contributors to find the "right" way to do it. Thanks, -Stolee
diff --git a/commit-reach.c b/commit-reach.c index efd5925cbb..50175b159e 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -321,7 +321,7 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, { struct commit_list *bases; int ret = 0, i; - uint32_t generation, min_generation = GENERATION_NUMBER_INFINITY; + uint32_t generation, max_generation = GENERATION_NUMBER_ZERO; if (repo_parse_commit(r, commit)) return ret; @@ -330,12 +330,12 @@ int repo_in_merge_bases_many(struct repository *r, struct commit *commit, return ret; generation = commit_graph_generation(reference[i]); - if (generation < min_generation) - min_generation = generation; + if (generation > max_generation) + max_generation = generation; } generation = commit_graph_generation(commit); - if (generation > min_generation) + if (generation > max_generation) return ret; bases = paint_down_to_common(r, commit, diff --git a/t/helper/test-reach.c b/t/helper/test-reach.c index 14a3655442..cda804ed79 100644 --- a/t/helper/test-reach.c +++ b/t/helper/test-reach.c @@ -107,6 +107,8 @@ int cmd__reach(int ac, const char **av) printf("%s(A,B):%d\n", av[1], ref_newer(&oid_A, &oid_B)); else if (!strcmp(av[1], "in_merge_bases")) printf("%s(A,B):%d\n", av[1], in_merge_bases(A, B)); + else if (!strcmp(av[1], "in_merge_bases_many")) + printf("%s(A,X):%d\n", av[1], in_merge_bases_many(A, X_nr, X_array)); else if (!strcmp(av[1], "is_descendant_of")) printf("%s(A,X):%d\n", av[1], repo_is_descendant_of(r, A, X)); else if (!strcmp(av[1], "get_merge_bases_many")) { diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh index 475564bee7..f807276337 100755 --- a/t/t6600-test-reach.sh +++ b/t/t6600-test-reach.sh @@ -110,6 +110,36 @@ test_expect_success 'in_merge_bases:miss' ' test_three_modes in_merge_bases ' +test_expect_success 'in_merge_bases_many:hit' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-6-9 + X:commit-5-7 + EOF + echo "in_merge_bases_many(A,X):1" >expect && + test_three_modes in_merge_bases_many +' + +test_expect_success 'in_merge_bases_many:miss' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-7-7 + X:commit-8-6 + EOF + echo "in_merge_bases_many(A,X):0" >expect && + test_three_modes in_merge_bases_many +' + +test_expect_success 'in_merge_bases_many:miss-heuristic' ' + cat >input <<-\EOF && + A:commit-6-8 + X:commit-7-5 + X:commit-6-6 + EOF + echo "in_merge_bases_many(A,X):0" >expect && + test_three_modes in_merge_bases_many +' + test_expect_success 'is_descendant_of:hit' ' cat >input <<-\EOF && A:commit-5-7