@@ -30,10 +30,9 @@ static inline unsigned char get_bitmask(uint32_t pos)
static int load_bloom_filter_from_graph(struct commit_graph *g,
struct bloom_filter *filter,
- struct commit *c)
+ uint32_t graph_pos)
{
uint32_t lex_pos, start_index, end_index;
- uint32_t graph_pos = commit_graph_position(c);
while (graph_pos < g->num_commits_in_base)
g = g->base_graph;
@@ -203,9 +202,10 @@ struct bloom_filter *get_or_compute_bloom_filter(struct repository *r,
filter = bloom_filter_slab_at(&bloom_filters, c);
if (!filter->data) {
- load_commit_graph_info(r, c);
- if (commit_graph_position(c) != COMMIT_NOT_FROM_GRAPH)
- load_bloom_filter_from_graph(r->objects->commit_graph, filter, c);
+ uint32_t graph_pos;
+ if (repo_find_commit_pos_in_graph(r, c, &graph_pos))
+ load_bloom_filter_from_graph(r->objects->commit_graph,
+ filter, graph_pos);
}
if (filter->data && filter->len)
@@ -811,7 +811,7 @@ test_expect_success 'set up and verify repo with generation data overflow chunk'
graph_git_behavior 'generation data overflow chunk repo' repo left right
-test_expect_failure 'overflow during generation version upgrade' '
+test_expect_success 'overflow during generation version upgrade' '
git init overflow-v2-upgrade &&
(
cd overflow-v2-upgrade &&
The previous commit demonstrates a bug where a commit-graph using generation v2 could enter a state where one of the GDA2 values has its most-significant bit set (indicating that its value should be read from the extended offset table in the GDO2 chunk) without having a GDO2 chunk to read from. This results in the following error message being displayed to the caller: fatal: commit-graph requires overflow generation data but has none This bug arises in the following scenario: - We decide to write a commit-graph using generation number v2, and decide (correctly) that no GDO2 chunk is necessary (e.g., because all of the commiter date offsets are no larger than 2^31-1). - The v2 generation numbers are stored in the `->generation` member of the commit slab holding `struct commit_graph_data`'s. - Later on, `load_commit_graph_info()` is called, overwriting the v2 generation data in the aforementioned slab with any existing v1 generation data. Then, when the commit-graph code goes to write the GDA2 chunk via `write_graph_chunk_generation_data()`, we use the overwritten generation v1 data in a place where we expect to use a v2 generation number: offset = commit_graph_data_at(c)->generation - c->date; ...because `commit_graph_data_at(c)->generation` used to hold the v2 generation data, but it was overwritten to contain the v1 generation number via `load_commit_graph_info()`. If the `offset` computation above overflows the v2 generation number max, then `write_graph_chunk_generation_data()` will update its count of large offsets and write the marker accordingly: if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { offset = CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_data_overflows; num_generation_data_overflows++; } and reads will look for the GDO2 chunk containing the overflowing v2 generation number, *after* the commit-graph code decided that no such chunk was necessary. The main problem is that the slab containing `struct commit_graph_data` has a dual purpose. It is used to hold data that we are about to write to disk while generating a commit-graph, as well as hold data that was read from an existing commit-graph. When the two mix, namely when the result of reading the commit-graph has a side-effect that mixes poorly with an in-progress commit-graph write, we end up with corrupt data. A complete fix might be to introduce a new slab that is used exclusively for writing, and gate access between the two slabs based on context provided by the caller (e.g., whether this computation is part of a "read" or "write" operation). But a more minimal fix addresses the only known path which overwrites the slab data, which is `compute_bloom_filters()` -> `get_or_compute_bloom_filter()` -> `load_commit_graph_info()` -> `fill_commit_graph_info()` by avoiding the last call which clobbers the data altogether. This path only needs to learn the graph position of a given commit so that it can be used in `load_bloom_filter_from_graph()`. By replacing the last steps of the above with one that records the graph position into a temporary variable which is then used to load the existing Bloom data, we eliminate the clobbering, removing the corruption. Signed-off-by: Taylor Blau <me@ttaylorr.com> --- bloom.c | 10 +++++----- t/t5318-commit-graph.sh | 2 +- 2 files changed, 6 insertions(+), 6 deletions(-)