Message ID | 833779ad53eb4f57ae514f4e8964e397845f1ddd.1596941625.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | Implement Corrected Commit Date | expand |
On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote: > From: Abhishek Kumar <abhishekkumar8222@gmail.com> > > As corrected commit dates and topological levels cannot be compared > directly, we must handle commit graph chains with mixed generation > number definitions. > > While reading a commit graph file, we disable generation numbers if the > chain contains mixed generation numbers. > > While writing to commit graph chain, we write generation data chunk only > if the previous tip of chain had a generation data chunk. Using > `--split=replace` overwrites the existing chain and writes generation > data chunk regardless of previous tip. > > In t5324-split-commit-graph, we set up a repo with twelve commits and > write a base commit graph file with no generation data chunk. When add > three commits and write to chain again, Git does not write generation > data chunk even without setting GIT_TEST_COMMIT_GRAPH_NO_GDAT=1. Then, > as we replace the existing chain, Git writes a commit graph file with > generation data chunk. > > Signed-off-by: Abhishek Kumar <abhishekkumar8222@gmail.com> > --- > commit-graph.c | 14 ++++++++ > t/t5324-split-commit-graph.sh | 66 +++++++++++++++++++++++++++++++++++ > 2 files changed, 80 insertions(+) > > diff --git a/commit-graph.c b/commit-graph.c > index d0f977852b..c6b6111adf 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -674,6 +674,14 @@ int generation_numbers_enabled(struct repository *r) > if (!g->num_commits) > return 0; > > + /* We cannot compare topological levels and corrected commit dates */ > + while (g->base_graph) { > + warning(_("commit-graph-chain contains mixed generation versions")); This warning is premature. It will add a warning whenever we have a split commit-graph, regardless of an incorrect chain. > + if ((g->chunk_generation_data == NULL) ^ (g->base_graph->chunk_generation_data == NULL)) Hm. A bit-wise XOR here? That seems unfortunate. I think that it is easier to focus on the > + return 0; > + g = g->base_graph; > + } > + Hm. So this scenario actually disables generation numbers completely in the event that anything in the chain disagrees. I think this is not the right way to approach the situation, as it will significantly punish users in this state with slow performance. The patch I sent [1] is probably better: it uses generation number v1 if the tip of the chain does not have a GDAT chunk. [1] https://lore.kernel.org/git/a3910f82-ab2e-bf35-ac43-c30d77f3c96b@gmail.com/ > first_generation = get_be32(g->chunk_commit_data + > g->hash_len + 8) >> 2; > > @@ -2186,6 +2194,9 @@ int write_commit_graph(struct object_directory *odb, > > g = ctx->r->objects->commit_graph; > > + if (g && !g->chunk_generation_data) > + ctx->write_generation_data = 0; > + > while (g) { > ctx->num_commit_graphs_before++; > g = g->base_graph; > @@ -2204,6 +2215,9 @@ int write_commit_graph(struct object_directory *odb, > > if (ctx->split_opts) > replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; > + > + if (replace) > + ctx->write_generation_data = 1; > } Please make a point to move the line that checks GIT_TEST_COMMIT_GRAPH_NO_GDAT from its current location to after this line. We want to make sure that the environment variable is checked _last_. The best location is likely the start of the implementation of compute_generation_numbers(), or immediately before the call to the method. > +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' > + mkdir mixed && > + graphdir=".git/objects/info/commit-graphs" && > + cd "$TRASH_DIRECTORY/mixed" && > + git init && > + git config core.commitGraph true && > + git config gc.writeCommitGraph false && > + for i in $(test_seq 3) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/1 && > + for i in $(test_seq 4 5) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/2 && > + for i in $(test_seq 6 10) > + do > + test_commit $i && > + git branch commits/$i || return 1 > + done && > + git reset --hard commits/2 && > + git merge commits/4 && > + git branch merge/1 && > + git reset --hard commits/4 && > + git merge commits/6 && > + git branch merge/2 && > + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split && > + test-tool read-graph >output && > + cat >expect <<-EOF && > + header: 43475048 1 1 3 0 > + num_commits: 12 > + chunks: oid_fanout oid_lookup commit_metadata > + EOF > + test_cmp expect output > +' > + > +test_expect_success 'does not write generation data chunk if not present on existing tip' ' > + cd "$TRASH_DIRECTORY/mixed" && > + git reset --hard commits/3 && > + git merge merge/1 && > + git merge commits/5 && > + git merge merge/2 && > + git branch merge/3 && > + git commit-graph write --reachable --split && > + test-tool read-graph >output && > + cat >expect <<-EOF && > + header: 43475048 1 1 4 1 > + num_commits: 3 > + chunks: oid_fanout oid_lookup commit_metadata > + EOF > + test_cmp expect output > +' > + > +test_expect_success 'writes generation data chunk when commit-graph chain is replaced' ' > + cd "$TRASH_DIRECTORY/mixed" && > + git commit-graph write --reachable --split='replace' && > + test_path_is_file $graphdir/commit-graph-chain && > + test_line_count = 1 $graphdir/commit-graph-chain && > + verify_chain_files_exist $graphdir && > + graph_read_expect 15 > +' It would be valuable to double-check here that the values in the GDAT chunk are correct. I'm concerned about the possibility that the 'generation' member of struct commit_graph_data gets filled with topological level during parsing and then that is written as an offset into the CDAT chunk. Perhaps this is best left for a follow-up series that updates the 'verify' subcommand to check the GDAT chunk. Thanks, -Stolee
On Mon, Aug 10, 2020 at 12:42:29PM -0400, Derrick Stolee wrote: > On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote: > > ... > > Hm. So this scenario actually disables generation numbers completely > in the event that anything in the chain disagrees. I think this is > not the right way to approach the situation, as it will significantly > punish users in this state with slow performance. > > The patch I sent [1] is probably better: it uses generation number > v1 if the tip of the chain does not have a GDAT chunk. > > [1] https://lore.kernel.org/git/a3910f82-ab2e-bf35-ac43-c30d77f3c96b@gmail.com/ > Yes, the patch is an clear improvement over my (convoluted and incorrect) logic. Will add. > > ... > > Please make a point to move the line that checks GIT_TEST_COMMIT_GRAPH_NO_GDAT > from its current location to after this line. We want to make sure that the > environment variable is checked _last_. The best location is likely the start > of the implementation of compute_generation_numbers(), or immediately before > the call to the method. > Sure, will do. > > ... > > It would be valuable to double-check here that the values in the GDAT chunk > are correct. I'm concerned about the possibility that the 'generation' > member of struct commit_graph_data gets filled with topological level during > parsing and then that is written as an offset into the CDAT chunk. > > Perhaps this is best left for a follow-up series that updates the 'verify' > subcommand to check the GDAT chunk. If I can understand it correctly, one of ways to update 'verify' subcommand to check the GDAT chunk as well would to be make use of the flag variable introduced in your patch. We can isolate generation number related checks and run checks once with flag = 1 (checking corrected commit dates) and once with flag = 0 (checking topological levels). This has the unfortunate effect of filling all commits twice, but as we cannot change the commit_graph_data->generation any other way, I see no alternatives without changing how commit_graph_generation() works. Would it make more sense if we add the flag to struct commit_graph instead of making it depend solely on g->chunk_generation_data and set it within parse_commit_graph()? We would be able to control the behavior of fill_commit_graph_info() and we will not need to check g->chunk_generation_data before filling every commit. > > Thanks, > -Stolee > Thanks - Abhishek
On 8/11/2020 7:36 AM, Abhishek Kumar wrote: > On Mon, Aug 10, 2020 at 12:42:29PM -0400, Derrick Stolee wrote: >> On 8/8/2020 10:53 PM, Abhishek Kumar via GitGitGadget wrote: >> >> ... >> >> Hm. So this scenario actually disables generation numbers completely >> in the event that anything in the chain disagrees. I think this is >> not the right way to approach the situation, as it will significantly >> punish users in this state with slow performance. >> >> The patch I sent [1] is probably better: it uses generation number >> v1 if the tip of the chain does not have a GDAT chunk. >> >> [1] https://lore.kernel.org/git/a3910f82-ab2e-bf35-ac43-c30d77f3c96b@gmail.com/ >> > > Yes, the patch is an clear improvement over my (convoluted and incorrect) > logic. Will add. > >> >> ... >> >> Please make a point to move the line that checks GIT_TEST_COMMIT_GRAPH_NO_GDAT >> from its current location to after this line. We want to make sure that the >> environment variable is checked _last_. The best location is likely the start >> of the implementation of compute_generation_numbers(), or immediately before >> the call to the method. >> > > Sure, will do. > >> >> ... >> >> It would be valuable to double-check here that the values in the GDAT chunk >> are correct. I'm concerned about the possibility that the 'generation' >> member of struct commit_graph_data gets filled with topological level during >> parsing and then that is written as an offset into the CDAT chunk. >> >> Perhaps this is best left for a follow-up series that updates the 'verify' >> subcommand to check the GDAT chunk. > > If I can understand it correctly, one of ways to update 'verify' > subcommand to check the GDAT chunk as well would to be make use of the > flag variable introduced in your patch. We can isolate generation number > related checks and run checks once with flag = 1 (checking corrected > commit dates) and once with flag = 0 (checking topological levels). > > This has the unfortunate effect of filling all commits twice, but as we > cannot change the commit_graph_data->generation any other way, I see no > alternatives without changing how commit_graph_generation() works. > > Would it make more sense if we add the flag to struct commit_graph > instead of making it depend solely on g->chunk_generation_data and set > it within parse_commit_graph()? > > We would be able to control the behavior of fill_commit_graph_info() and > we will not need to check g->chunk_generation_data before filling every > commit. I missed that you _already_ updated the logic in verify_commit_graph() based on the generation. That logic should catch the problem, so it might be enough to just add some "git commit-graph verify" commands into your multi-level tests. Specifically, the end result is this check: corrected_commit_date = commit_graph_generation(graph_commit); if (corrected_commit_date < max_parent_corrected_commit_date + 1) graph_report(_("commit-graph generation for commit %s is %"PRItime" < %"PRItime), oid_to_hex(&cur_oid), corrected_commit_date, max_parent_corrected_commit_date + 1); This will catch the order violations I was proposing could happen. It doesn't go the extra mile to ensure that the commit-graph stores the exact correct value or that the two bits of data are correct (both topo-level and corrected commit date). That is fine for now, and we can revisit if necessary. The diff below makes some tweaks to your split-level test to show the logic _was_ incorrect without my patch. Please incorporate the test changes into your series. Note in particular that I added a base layer that includes the GDAT chunk and _then_ adds a layer without the GDAT chunk. That is an important case! Thanks, -Stolee --- >8 --- diff --git a/commit-graph.c b/commit-graph.c index 17623274d9..d891a8ba3a 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -674,14 +674,6 @@ int generation_numbers_enabled(struct repository *r) if (!g->num_commits) return 0; - /* We cannot compare topological levels and corrected commit dates */ - while (g->base_graph) { - warning(_("commit-graph-chain contains mixed generation versions")); - if ((g->chunk_generation_data == NULL) ^ (g->base_graph->chunk_generation_data == NULL)) - return 0; - g = g->base_graph; - } - first_generation = get_be32(g->chunk_commit_data + g->hash_len + 8) >> 2; @@ -787,7 +779,7 @@ static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, date_low = get_be32(commit_data + g->hash_len + 12); item->date = (timestamp_t)((date_high << 32) | date_low); - if (g->chunk_generation_data && (flags & COMMIT_GRAPH_GENERATION_V2)) + if (g->chunk_generation_data) graph_data->generation = item->date + (timestamp_t) get_be32(g->chunk_generation_data + sizeof(uint32_t) * lex_index); else diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 1a9be5e656..721515cc23 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -443,6 +443,7 @@ test_expect_success 'setup repo for mixed generation commit-graph-chain' ' test_commit $i && git branch commits/$i || return 1 done && + git commit-graph write --reachable --split && git reset --hard commits/2 && for i in $(test_seq 6 10) do @@ -455,14 +456,15 @@ test_expect_success 'setup repo for mixed generation commit-graph-chain' ' git reset --hard commits/4 && git merge commits/6 && git branch merge/2 && - GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split=no-merge && test-tool read-graph >output && cat >expect <<-EOF && - header: 43475048 1 1 3 0 - num_commits: 12 + header: 43475048 1 1 4 1 + num_commits: 7 chunks: oid_fanout oid_lookup commit_metadata EOF - test_cmp expect output + test_cmp expect output && + git commit-graph verify ' test_expect_success 'does not write generation data chunk if not present on existing tip' ' @@ -472,23 +474,25 @@ test_expect_success 'does not write generation data chunk if not present on exis git merge commits/5 && git merge merge/2 && git branch merge/3 && - git commit-graph write --reachable --split && + git commit-graph write --reachable --split=no-merge && test-tool read-graph >output && cat >expect <<-EOF && header: 43475048 1 1 4 1 num_commits: 3 chunks: oid_fanout oid_lookup commit_metadata EOF - test_cmp expect output + test_cmp expect output && + git commit-graph verify ' test_expect_success 'writes generation data chunk when commit-graph chain is replaced' ' cd "$TRASH_DIRECTORY/mixed" && - git commit-graph write --reachable --split='replace' && + git commit-graph write --reachable --split=replace && test_path_is_file $graphdir/commit-graph-chain && test_line_count = 1 $graphdir/commit-graph-chain && verify_chain_files_exist $graphdir && - graph_read_expect 15 + graph_read_expect 15 && + git commit-graph verify ' test_done
diff --git a/commit-graph.c b/commit-graph.c index d0f977852b..c6b6111adf 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -674,6 +674,14 @@ int generation_numbers_enabled(struct repository *r) if (!g->num_commits) return 0; + /* We cannot compare topological levels and corrected commit dates */ + while (g->base_graph) { + warning(_("commit-graph-chain contains mixed generation versions")); + if ((g->chunk_generation_data == NULL) ^ (g->base_graph->chunk_generation_data == NULL)) + return 0; + g = g->base_graph; + } + first_generation = get_be32(g->chunk_commit_data + g->hash_len + 8) >> 2; @@ -2186,6 +2194,9 @@ int write_commit_graph(struct object_directory *odb, g = ctx->r->objects->commit_graph; + if (g && !g->chunk_generation_data) + ctx->write_generation_data = 0; + while (g) { ctx->num_commit_graphs_before++; g = g->base_graph; @@ -2204,6 +2215,9 @@ int write_commit_graph(struct object_directory *odb, if (ctx->split_opts) replace = ctx->split_opts->flags & COMMIT_GRAPH_SPLIT_REPLACE; + + if (replace) + ctx->write_generation_data = 1; } ctx->approx_nr_objects = approximate_object_count(); diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh index 6b25c3d9ce..1a9be5e656 100755 --- a/t/t5324-split-commit-graph.sh +++ b/t/t5324-split-commit-graph.sh @@ -425,4 +425,70 @@ done <<\EOF 0600 -r-------- EOF +test_expect_success 'setup repo for mixed generation commit-graph-chain' ' + mkdir mixed && + graphdir=".git/objects/info/commit-graphs" && + cd "$TRASH_DIRECTORY/mixed" && + git init && + git config core.commitGraph true && + git config gc.writeCommitGraph false && + for i in $(test_seq 3) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/1 && + for i in $(test_seq 4 5) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/2 && + for i in $(test_seq 6 10) + do + test_commit $i && + git branch commits/$i || return 1 + done && + git reset --hard commits/2 && + git merge commits/4 && + git branch merge/1 && + git reset --hard commits/4 && + git merge commits/6 && + git branch merge/2 && + GIT_TEST_COMMIT_GRAPH_NO_GDAT=1 git commit-graph write --reachable --split && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 1 3 0 + num_commits: 12 + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output +' + +test_expect_success 'does not write generation data chunk if not present on existing tip' ' + cd "$TRASH_DIRECTORY/mixed" && + git reset --hard commits/3 && + git merge merge/1 && + git merge commits/5 && + git merge merge/2 && + git branch merge/3 && + git commit-graph write --reachable --split && + test-tool read-graph >output && + cat >expect <<-EOF && + header: 43475048 1 1 4 1 + num_commits: 3 + chunks: oid_fanout oid_lookup commit_metadata + EOF + test_cmp expect output +' + +test_expect_success 'writes generation data chunk when commit-graph chain is replaced' ' + cd "$TRASH_DIRECTORY/mixed" && + git commit-graph write --reachable --split='replace' && + test_path_is_file $graphdir/commit-graph-chain && + test_line_count = 1 $graphdir/commit-graph-chain && + verify_chain_files_exist $graphdir && + graph_read_expect 15 +' + test_done