Message ID | 83feabeebb5f035059758fba1ca5cf74f3a22c91.1612183647.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Speed up remove_redundant() | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > The important piece is to ensure we short-circuit the walk when we find > that there is a single non-redundant commit. This happens frequently > when looking for merge-bases or comparing several tags with 'git > merge-base --independent'. Use a new count 'count_still_independent' and > if that hits 1 we can stop walking. That is because when you are left one single thing, it may be able to reach many other things, but the fact that it by itself won't be reachable by remaining independent things will not change (because, that sole remaining independent thing is itself)? > To update 'count_still_independent' properly, we add use of the RESULT > flag on the input commits. Then we can detect when we reach one of these > commits and decrease the count. We need to remove the RESULT flag at > that moment because we might re-visit that commit when popping the > stack. > > We use the STALE flag to mark parents that have been added to the new > walk_start list, but we need to clear that flag before we start walking > so those flags don't halt our depth-first-search walk. > > On my copy of the Linux kernel repository, the performance of 'git > merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11 > seconds. These two numbers are with commit-graph fully populated with usable generation numbers, I presume, and it is quite impressive.
On 2/1/2021 3:05 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> The important piece is to ensure we short-circuit the walk when we find >> that there is a single non-redundant commit. This happens frequently >> when looking for merge-bases or comparing several tags with 'git >> merge-base --independent'. Use a new count 'count_still_independent' and >> if that hits 1 we can stop walking. > > That is because when you are left one single thing, it may be able > to reach many other things, but the fact that it by itself won't be > reachable by remaining independent things will not change (because, > that sole remaining independent thing is itself)? Right. If there is only one non-STALE input commit left, then it will be the only returned result. We will never find that all commits are redundant because the commit graph is acyclic. The performance improvement comes from halting the DFS walk: there might be more commits to walk but they won't change the result. >> To update 'count_still_independent' properly, we add use of the RESULT >> flag on the input commits. Then we can detect when we reach one of these >> commits and decrease the count. We need to remove the RESULT flag at >> that moment because we might re-visit that commit when popping the >> stack. >> >> We use the STALE flag to mark parents that have been added to the new >> walk_start list, but we need to clear that flag before we start walking >> so those flags don't halt our depth-first-search walk. >> >> On my copy of the Linux kernel repository, the performance of 'git >> merge-base --independent <all-tags>' goes from 1.1 seconds to 0.11 >> seconds. > > These two numbers are with commit-graph fully populated with usable > generation numbers, I presume, and it is quite impressive. Yes, these numbers are with a full commit-graph. Thanks, -Stolee
diff --git a/commit-reach.c b/commit-reach.c index 7bf52e94429..d3a6e2bdd04 100644 --- a/commit-reach.c +++ b/commit-reach.c @@ -29,6 +29,10 @@ static int compare_commits_by_gen(const void *_a, const void *_b) return -1; if (generation_a > generation_b) return 1; + if (a->date < b->date) + return -1; + if (a->date > b->date) + return 1; return 0; } @@ -231,11 +235,10 @@ static int remove_redundant_no_gen(struct repository *r, static int remove_redundant_with_gen(struct repository *r, struct commit **array, int cnt) { - int i, count_non_stale = 0; + int i, count_non_stale = 0, count_still_independent = cnt; timestamp_t min_generation = GENERATION_NUMBER_INFINITY; struct commit **walk_start; size_t walk_start_nr = 0, walk_start_alloc = cnt; - struct prio_queue queue = { compare_commits_by_gen_then_commit_date }; ALLOC_ARRAY(walk_start, walk_start_alloc); @@ -245,6 +248,7 @@ static int remove_redundant_with_gen(struct repository *r, timestamp_t generation; repo_parse_commit(r, array[i]); + array[i]->object.flags |= RESULT; parents = array[i]->parents; while (parents) { @@ -253,7 +257,6 @@ static int remove_redundant_with_gen(struct repository *r, parents->item->object.flags |= STALE; ALLOC_GROW(walk_start, walk_start_nr + 1, walk_start_alloc); walk_start[walk_start_nr++] = parents->item; - prio_queue_put(&queue, parents->item); } parents = parents->next; } @@ -264,26 +267,63 @@ static int remove_redundant_with_gen(struct repository *r, min_generation = generation; } - /* push the STALE bits up to min generation */ - while (queue.nr) { - struct commit_list *parents; - struct commit *c = prio_queue_get(&queue); + QSORT(walk_start, walk_start_nr, compare_commits_by_gen); - repo_parse_commit(r, c); + /* remove STALE bit for now to allow walking through parents */ + for (i = 0; i < walk_start_nr; i++) + walk_start[i]->object.flags &= ~STALE; - if (commit_graph_generation(c) < min_generation) - continue; + /* + * Start walking from the highest generation. Hopefully, it will + * find all other items during the first-parent walk, and we can + * terminate early. Otherwise, we will do the same amount of work + * as before. + */ + for (i = walk_start_nr - 1; i >= 0 && count_still_independent > 1; i--) { + /* push the STALE bits up to min generation */ + struct commit_list *stack = NULL; - parents = c->parents; - while (parents) { - if (!(parents->item->object.flags & STALE)) { - parents->item->object.flags |= STALE; - prio_queue_put(&queue, parents->item); + commit_list_insert(walk_start[i], &stack); + walk_start[i]->object.flags |= STALE; + + while (stack) { + struct commit_list *parents; + struct commit *c = stack->item; + + repo_parse_commit(r, c); + + if (c->object.flags & RESULT) { + c->object.flags &= ~RESULT; + if (--count_still_independent <= 1) + break; } - parents = parents->next; + + if (commit_graph_generation(c) < min_generation) { + pop_commit(&stack); + continue; + } + + parents = c->parents; + while (parents) { + if (!(parents->item->object.flags & STALE)) { + parents->item->object.flags |= STALE; + commit_list_insert(parents->item, &stack); + break; + } + parents = parents->next; + } + + /* pop if all parents have been visited already */ + if (!parents) + pop_commit(&stack); } + free_commit_list(stack); } + /* clear result */ + for (i = 0; i < cnt; i++) + array[i]->object.flags &= ~RESULT; + /* rearrange array */ for (i = count_non_stale = 0; i < cnt; i++) { if (!(array[i]->object.flags & STALE))