Message ID | pull.850.v2.git.1612234883.gitgitgadget@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | Generation Number v2: Fix a tricky split graph bug | expand |
On Tue, Feb 02, 2021 at 03:01:17AM +0000, Derrick Stolee via GitGitGadget wrote: > Updates in v2 > ============= > > * Fixed some typos or other clarifications in commit messages. > > * The loop assigning read_generation_data is skipped if they already all > agree with value 1. > > * I split compute_generation_numbers into two methods. This essentially > splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just > splits the logic as-is, then the new patch 5 does the re-initialization > of generation values when in the upgrade scenario. Thanks for addressing my nits. I'm much happier with patch 5 as a result of your changes, so I think this series was a positive step forward not only for fixing a correctness bug, but also for improving the readability of commit-graph.c. Both the range-diff and the new patches look good to me, so this series has my: Reviewed-by: Taylor Blau <me@ttaylorr.com> Thanks, Taylor
On Tue, Feb 02, 2021 at 03:01:17AM +0000, Derrick Stolee via GitGitGadget wrote: > Here is a bugfix for the recently-landed-in-next generation v2 topic > (ak/corrected-commit-date). > > This was occasionally hitting us when computing new commit-graphs on > existing repositories with the new bits. It was very hard to reproduce, and > it turns out to be due to not parsing commits before accessing generation > number data. Doing so in the right place demonstrates the bug of recomputing > the corrected commit date even for commits in lower layers with computed > values. > > The fix is split into these steps: > > 1. Parse commits more often before accessing their data. (This allows the > bug to be demonstrated in the test suite.) > 2. Check the full commit-graph chain for generation data chunks. > 3. Don't compute corrected commit dates if the lower layers do not support > them. > 4. Parse the commit-graph file more often. > > Thanks, -Stolee > Thanks for the fast bug-fix. It must have been hard to track down why as we weren't able to reproduce this behaviour before. As Taylor said before, this patch is a clear improvement in readability and correctness. Thanks - Abhishek > > Updates in v2 > ============= > > * Fixed some typos or other clarifications in commit messages. > > * The loop assigning read_generation_data is skipped if they already all > agree with value 1. > > * I split compute_generation_numbers into two methods. This essentially > splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just > splits the logic as-is, then the new patch 5 does the re-initialization > of generation values when in the upgrade scenario. > > Derrick Stolee (6): > commit-graph: use repo_parse_commit > commit-graph: always parse before commit_graph_data_at() > commit-graph: validate layers for generation data > commit-graph: compute generations separately > commit-graph: be extra careful about mixed generations > commit-graph: prepare commit graph > > commit-graph.c | 138 +++++++++++++++++++++++++++++----------- > commit.h | 5 +- > t/t5318-commit-graph.sh | 21 ++++++ > 3 files changed, 125 insertions(+), 39 deletions(-) > > > base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0 > Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-850%2Fderrickstolee%2Fgen-v2-upgrade-fix-v2 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-850/derrickstolee/gen-v2-upgrade-fix-v2 > Pull-Request: https://github.com/gitgitgadget/git/pull/850 > > Range-diff vs v1: > > 1: 9c605c99f66 = 1: 9c605c99f66 commit-graph: use repo_parse_commit > 2: 82afa811dff ! 2: 454b183b9ba commit-graph: always parse before commit_graph_data_at() > @@ Commit message > problems when writing a commit-graph with corrected commit dates based > on a commit-graph without them. > > - It has been difficult to identify the issue here becaus it was so hard > + It has been difficult to identify the issue here because it was so hard > to reproduce. It relies on this uninitialized data having a non-zero > value, but also on specifically in a way that overwrites the existing > data. > 3: d554fa30660 ! 3: 3d223fa2156 commit-graph: validate layers for generation data > @@ commit-graph.c: static struct commit_graph *load_commit_graph_chain(struct repos > > - if (!g) > - return; > -- > -- read_generation_data = !!g->chunk_generation_data; > + while (read_generation_data && p) { > + read_generation_data = p->read_generation_data; > + p = p->base_graph; > + } > > +- read_generation_data = !!g->chunk_generation_data; > ++ if (read_generation_data) > ++ return 1; > + > while (g) { > - g->read_generation_data = read_generation_data; > +- g->read_generation_data = read_generation_data; > ++ g->read_generation_data = 0; > g = g->base_graph; > } > + > -+ return read_generation_data; > ++ return 0; > } > > struct commit_graph *read_commit_graph_one(struct repository *r, > -: ----------- > 4: 05248ff222f commit-graph: compute generations separately > 4: b267a9653a7 ! 5: 9bccee8fb63 commit-graph: be extra careful about mixed generations > @@ Commit message > existing commit-graph data has no corrected commit dates. > > While this improves our situation somewhat, we have not completely > - solved the issue for correctly computing generation numbers for mixes > + solved the issue for correctly computing generation numbers for mixed > layers. That follows in the next change. > > Signed-off-by: Derrick Stolee <dstolee@microsoft.com> > @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph > _("Computing commit graph generation numbers"), > ctx->commits.nr); > + > -+ if (ctx->write_generation_data && !ctx->trust_generation_numbers) { > ++ if (!ctx->trust_generation_numbers) { > + for (i = 0; i < ctx->commits.nr; i++) { > + struct commit *c = ctx->commits.list[i]; > + repo_parse_commit(ctx->r, c); > @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph > + > for (i = 0; i < ctx->commits.nr; i++) { > struct commit *c = ctx->commits.list[i]; > - uint32_t level; > -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) > - corrected_commit_date = commit_graph_data_at(parent->item)->generation; > - > - if (level == GENERATION_NUMBER_ZERO || > -- corrected_commit_date == GENERATION_NUMBER_ZERO) { > -+ (ctx->write_generation_data && > -+ corrected_commit_date == GENERATION_NUMBER_ZERO)) { > - all_parents_computed = 0; > - commit_list_insert(parent->item, &list); > - break; > -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) > - max_level = GENERATION_NUMBER_V1_MAX - 1; > - *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; > - > -- if (current->date && current->date > max_corrected_commit_date) > -- max_corrected_commit_date = current->date - 1; > -- commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; > -- > -- if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > -- ctx->num_generation_data_overflows++; > -+ if (ctx->write_generation_data) { > -+ timestamp_t cur_g; > -+ if (current->date && current->date > max_corrected_commit_date) > -+ max_corrected_commit_date = current->date - 1; > -+ cur_g = commit_graph_data_at(current)->generation > -+ = max_corrected_commit_date + 1; > -+ if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) > -+ ctx->num_generation_data_overflows++; > -+ } > - } > - } > - } > + timestamp_t corrected_commit_date; > @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, > } else > ctx->num_commit_graphs_after = 1; > @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, > - validate_mixed_generation_chain(ctx->r->objects->commit_graph); > + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); > > - compute_generation_numbers(ctx); > - > + compute_topological_levels(ctx); > + if (ctx->write_generation_data) > 5: dddeec30ebf ! 6: 38086c85b52 commit-graph: prepare commit graph > @@ Commit message > commit-graph: prepare commit graph > > Before checking if the repository has a commit-graph loaded, be sure > - to run prepare_commit_graph(). This is necessary because without this > - instance we would not initialize the topo_levels slab for each of the > - struct commit_graphs in the chain before we start to parse the > - commits. This leads to possibly recomputing the topological levels for > - commits in lower layers even when we are adding a small number of > - commits on top. > + to run prepare_commit_graph(). This is necessary because otherwise > + the topo_levels slab is not initialized. As we compute topo_levels for > + the new commits, we iterate further into the lower layers since the > + first visit to each commit looks as though the topo_level is not > + populated. > > By properly initializing the topo_slab, we fix the previously broken > case of a split commit graph where a base layer has the > > -- > gitgitgadget
Here is a bugfix for the recently-landed-in-next generation v2 topic (ak/corrected-commit-date). This was occasionally hitting us when computing new commit-graphs on existing repositories with the new bits. It was very hard to reproduce, and it turns out to be due to not parsing commits before accessing generation number data. Doing so in the right place demonstrates the bug of recomputing the corrected commit date even for commits in lower layers with computed values. The fix is split into these steps: 1. Parse commits more often before accessing their data. (This allows the bug to be demonstrated in the test suite.) 2. Check the full commit-graph chain for generation data chunks. 3. Don't compute corrected commit dates if the lower layers do not support them. 4. Parse the commit-graph file more often. Thanks, -Stolee Updates in v2 ============= * Fixed some typos or other clarifications in commit messages. * The loop assigning read_generation_data is skipped if they already all agree with value 1. * I split compute_generation_numbers into two methods. This essentially splits the previous patch 4 into patches 4 & 5 here. The new patch 4 just splits the logic as-is, then the new patch 5 does the re-initialization of generation values when in the upgrade scenario. Derrick Stolee (6): commit-graph: use repo_parse_commit commit-graph: always parse before commit_graph_data_at() commit-graph: validate layers for generation data commit-graph: compute generations separately commit-graph: be extra careful about mixed generations commit-graph: prepare commit graph commit-graph.c | 138 +++++++++++++++++++++++++++++----------- commit.h | 5 +- t/t5318-commit-graph.sh | 21 ++++++ 3 files changed, 125 insertions(+), 39 deletions(-) base-commit: 5a3b130cad0d5c770f766e3af6d32b41766374c0 Published-As: https://github.com/gitgitgadget/git/releases/tag/pr-850%2Fderrickstolee%2Fgen-v2-upgrade-fix-v2 Fetch-It-Via: git fetch https://github.com/gitgitgadget/git pr-850/derrickstolee/gen-v2-upgrade-fix-v2 Pull-Request: https://github.com/gitgitgadget/git/pull/850 Range-diff vs v1: 1: 9c605c99f66 = 1: 9c605c99f66 commit-graph: use repo_parse_commit 2: 82afa811dff ! 2: 454b183b9ba commit-graph: always parse before commit_graph_data_at() @@ Commit message problems when writing a commit-graph with corrected commit dates based on a commit-graph without them. - It has been difficult to identify the issue here becaus it was so hard + It has been difficult to identify the issue here because it was so hard to reproduce. It relies on this uninitialized data having a non-zero value, but also on specifically in a way that overwrites the existing data. 3: d554fa30660 ! 3: 3d223fa2156 commit-graph: validate layers for generation data @@ commit-graph.c: static struct commit_graph *load_commit_graph_chain(struct repos - if (!g) - return; -- -- read_generation_data = !!g->chunk_generation_data; + while (read_generation_data && p) { + read_generation_data = p->read_generation_data; + p = p->base_graph; + } +- read_generation_data = !!g->chunk_generation_data; ++ if (read_generation_data) ++ return 1; + while (g) { - g->read_generation_data = read_generation_data; +- g->read_generation_data = read_generation_data; ++ g->read_generation_data = 0; g = g->base_graph; } + -+ return read_generation_data; ++ return 0; } struct commit_graph *read_commit_graph_one(struct repository *r, -: ----------- > 4: 05248ff222f commit-graph: compute generations separately 4: b267a9653a7 ! 5: 9bccee8fb63 commit-graph: be extra careful about mixed generations @@ Commit message existing commit-graph data has no corrected commit dates. While this improves our situation somewhat, we have not completely - solved the issue for correctly computing generation numbers for mixes + solved the issue for correctly computing generation numbers for mixed layers. That follows in the next change. Signed-off-by: Derrick Stolee <dstolee@microsoft.com> @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph _("Computing commit graph generation numbers"), ctx->commits.nr); + -+ if (ctx->write_generation_data && !ctx->trust_generation_numbers) { ++ if (!ctx->trust_generation_numbers) { + for (i = 0; i < ctx->commits.nr; i++) { + struct commit *c = ctx->commits.list[i]; + repo_parse_commit(ctx->r, c); @@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph + for (i = 0; i < ctx->commits.nr; i++) { struct commit *c = ctx->commits.list[i]; - uint32_t level; -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) - corrected_commit_date = commit_graph_data_at(parent->item)->generation; - - if (level == GENERATION_NUMBER_ZERO || -- corrected_commit_date == GENERATION_NUMBER_ZERO) { -+ (ctx->write_generation_data && -+ corrected_commit_date == GENERATION_NUMBER_ZERO)) { - all_parents_computed = 0; - commit_list_insert(parent->item, &list); - break; -@@ commit-graph.c: static void compute_generation_numbers(struct write_commit_graph_context *ctx) - max_level = GENERATION_NUMBER_V1_MAX - 1; - *topo_level_slab_at(ctx->topo_levels, current) = max_level + 1; - -- if (current->date && current->date > max_corrected_commit_date) -- max_corrected_commit_date = current->date - 1; -- commit_graph_data_at(current)->generation = max_corrected_commit_date + 1; -- -- if (commit_graph_data_at(current)->generation - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) -- ctx->num_generation_data_overflows++; -+ if (ctx->write_generation_data) { -+ timestamp_t cur_g; -+ if (current->date && current->date > max_corrected_commit_date) -+ max_corrected_commit_date = current->date - 1; -+ cur_g = commit_graph_data_at(current)->generation -+ = max_corrected_commit_date + 1; -+ if (cur_g - current->date > GENERATION_NUMBER_V2_OFFSET_MAX) -+ ctx->num_generation_data_overflows++; -+ } - } - } - } + timestamp_t corrected_commit_date; @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, } else ctx->num_commit_graphs_after = 1; @@ commit-graph.c: int write_commit_graph(struct object_directory *odb, - validate_mixed_generation_chain(ctx->r->objects->commit_graph); + ctx->trust_generation_numbers = validate_mixed_generation_chain(ctx->r->objects->commit_graph); - compute_generation_numbers(ctx); - + compute_topological_levels(ctx); + if (ctx->write_generation_data) 5: dddeec30ebf ! 6: 38086c85b52 commit-graph: prepare commit graph @@ Commit message commit-graph: prepare commit graph Before checking if the repository has a commit-graph loaded, be sure - to run prepare_commit_graph(). This is necessary because without this - instance we would not initialize the topo_levels slab for each of the - struct commit_graphs in the chain before we start to parse the - commits. This leads to possibly recomputing the topological levels for - commits in lower layers even when we are adding a small number of - commits on top. + to run prepare_commit_graph(). This is necessary because otherwise + the topo_levels slab is not initialized. As we compute topo_levels for + the new commits, we iterate further into the lower layers since the + first visit to each commit looks as though the topo_level is not + populated. By properly initializing the topo_slab, we fix the previously broken case of a split commit graph where a base layer has the