Message ID | 20231009205951.GD3282181@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 52e2e8d43dbae8c05b68efd60cde2aacf3a23890 |
Headers | show |
Series | bounds-checks for chunk-based files | expand |
On Mon, Oct 09, 2023 at 04:59:51PM -0400, Jeff King wrote: > We load the oid fanout chunk with pair_chunk(), which means we never see > the size of the chunk. We just assume the on-disk file uses the > appropriate size, and if it's too small we'll access random memory. > > It's easy to check this up-front; the fanout always consists of 256 > uint32's, since it is a fanout of the first byte of the hash pointing > into the oid index. These parameters can't be changed without > introducing a new chunk type. Cool, this is the first patch that should start reducing our usage of the new pair_chunk_unsafe() and hardening these reads. Let's take a look... > This matches the similar check in the midx OIDF chunk (but note that > rather than checking for the error immediately, the graph code just > leaves parts of the struct NULL and checks for required fields later). > > Signed-off-by: Jeff King <peff@peff.net> > --- > commit-graph.c | 13 +++++++++++-- > t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++ > 2 files changed, 37 insertions(+), 2 deletions(-) > > diff --git a/commit-graph.c b/commit-graph.c > index a689a55b79..9b3b01da61 100644 > --- a/commit-graph.c > +++ b/commit-graph.c > @@ -305,6 +305,16 @@ static int verify_commit_graph_lite(struct commit_graph *g) > return 0; > } > > +static int graph_read_oid_fanout(const unsigned char *chunk_start, > + size_t chunk_size, void *data) > +{ > + struct commit_graph *g = data; > + if (chunk_size != 256 * sizeof(uint32_t)) > + return error("commit-graph oid fanout chunk is wrong size"); Should we mark this string for translation? > + g->chunk_oid_fanout = (const uint32_t *)chunk_start; > + return 0; > +} > + Nice. This makes sense and seems like an obvious improvement over the existing code. I wonder how common this pattern is. We have read_chunk() which is for handling more complex scenarios than this. But the safe version of pair_chunk() really just wants to check that the size of the chunk is as expected and assign the location in the mmap to some pointer. Do you think it would be worth changing pair_chunk() to take an expected size_t and handle this generically? I.e. have a version of chunk-format::pair_chunk_fn() that looks something like: static int pair_chunk_fn(const unsigned char *chunk_start, size_t chunk_size, void *data) { const unsigned char **p = data; if (chunk_size != data->size) return -1; *p = chunk_start; return 0; } and then our call here would be: if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, (const unsigned char **)&graph->chunk_oid_fanout, 256 * sizeof(uint32_t)) < 0) return error("commit-graph oid fanout chunk is wrong size"); I dunno. It's hard to have a more concrete recomendation without having read the rest of the series. So it's possible that this is just complete nonsense ;-). But my hunch is that there are a number of callers that would benefit from having this built in. Thanks, Taylor
On Tue, Oct 10, 2023 at 08:08:05PM -0400, Taylor Blau wrote: > Do you think it would be worth changing pair_chunk() to take an expected > size_t and handle this generically? I.e. have a version of > chunk-format::pair_chunk_fn() that looks something like: > > static int pair_chunk_fn(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > const unsigned char **p = data; > if (chunk_size != data->size) > return -1; > *p = chunk_start; > return 0; > } > > and then our call here would be: > > if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, > (const unsigned char **)&graph->chunk_oid_fanout, > 256 * sizeof(uint32_t)) < 0) > return error("commit-graph oid fanout chunk is wrong size"); I took a brief stab at implementing this tonight and came up with this, which applies on top of this patch. Looking through the rest of the series briefly[^1], I think having something like this would be useful: --- 8< --- diff --git a/chunk-format.c b/chunk-format.c index 8d910e23f6..38b0f847be 100644 --- a/chunk-format.c +++ b/chunk-format.c @@ -157,6 +157,8 @@ int read_table_of_contents(struct chunkfile *cf, struct pair_chunk_data { const unsigned char **p; size_t *size; + + size_t expected_size; }; static int pair_chunk_fn(const unsigned char *chunk_start, @@ -169,6 +171,20 @@ static int pair_chunk_fn(const unsigned char *chunk_start, return 0; } +static int pair_chunk_expect_fn(const unsigned char *chunk_start, + size_t chunk_size, + void *data) +{ + struct pair_chunk_data *pcd = data; + if (pcd->expected_size != chunk_size) + return error(_("mismatched chunk size, got: %"PRIuMAX", wanted:" + " %"PRIuMAX), + (uintmax_t)chunk_size, + (uintmax_t)pcd->expected_size); + *pcd->p = chunk_start; + return 0; +} + int pair_chunk(struct chunkfile *cf, uint32_t chunk_id, const unsigned char **p, @@ -178,6 +194,14 @@ int pair_chunk(struct chunkfile *cf, return read_chunk(cf, chunk_id, pair_chunk_fn, &pcd); } +int pair_chunk_expect(struct chunkfile *cf, + uint32_t chunk_id, const unsigned char **p, + size_t expected_size) +{ + struct pair_chunk_data pcd = { .p = p, .expected_size = expected_size }; + return read_chunk(cf, chunk_id, pair_chunk_expect_fn, &pcd); +} + int pair_chunk_unsafe(struct chunkfile *cf, uint32_t chunk_id, const unsigned char **p) diff --git a/chunk-format.h b/chunk-format.h index 8dce7967f4..778f81f555 100644 --- a/chunk-format.h +++ b/chunk-format.h @@ -53,6 +53,10 @@ int pair_chunk(struct chunkfile *cf, const unsigned char **p, size_t *size); +int pair_chunk_expect(struct chunkfile *cf, + uint32_t chunk_id, const unsigned char **p, + size_t expected_size); + /* * Unsafe version of pair_chunk; it does not return the size, * meaning that the caller cannot possibly be careful about diff --git a/commit-graph.c b/commit-graph.c index 9b3b01da61..ed85161fdb 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -305,16 +305,6 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } -static int graph_read_oid_fanout(const unsigned char *chunk_start, - size_t chunk_size, void *data) -{ - struct commit_graph *g = data; - if (chunk_size != 256 * sizeof(uint32_t)) - return error("commit-graph oid fanout chunk is wrong size"); - g->chunk_oid_fanout = (const uint32_t *)chunk_start; - return 0; -} - static int graph_read_oid_lookup(const unsigned char *chunk_start, size_t chunk_size, void *data) { @@ -404,7 +394,10 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, GRAPH_HEADER_SIZE, graph->num_chunks)) goto free_and_return; - read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); + if (pair_chunk_expect(cf, GRAPH_CHUNKID_OIDFANOUT, + (const unsigned char **)&graph->chunk_oid_fanout, + 256 * sizeof(uint32_t)) < 0) + error(_("commit-graph oid fanout chunk is wrong size")); read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index d25bea3ec5..467b5f5e9c 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -841,6 +841,7 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' # otherwise we hit an earlier check check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) && cat >expect.err <<-\EOF && + error: mismatched chunk size, got: 1000, wanted: 1024 error: commit-graph oid fanout chunk is wrong size error: commit-graph is missing the OID Fanout chunk EOF --- >8 --- Or, quite honestly, having the "expected_size" parameter be required in the safe version of `pair_chunk()`. Thanks, Taylor [^1]: A non-brief review is on my to-do list for tomorrow.
On Tue, Oct 10, 2023 at 08:08:05PM -0400, Taylor Blau wrote: > Nice. This makes sense and seems like an obvious improvement over the > existing code. > > I wonder how common this pattern is. We have read_chunk() which is for > handling more complex scenarios than this. But the safe version of > pair_chunk() really just wants to check that the size of the chunk is as > expected and assign the location in the mmap to some pointer. Sometimes yes, sometimes no. For fixed-size ones like this, that's sufficient. For others we have to record the size and use it for later bounds-checking. IIRC it's about 50/50 between the two. > Do you think it would be worth changing pair_chunk() to take an expected > size_t and handle this generically? I.e. have a version of > chunk-format::pair_chunk_fn() that looks something like: > > static int pair_chunk_fn(const unsigned char *chunk_start, > size_t chunk_size, void *data) > { > const unsigned char **p = data; > if (chunk_size != data->size) > return -1; > *p = chunk_start; > return 0; > } > > and then our call here would be: > > if (pair_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, > (const unsigned char **)&graph->chunk_oid_fanout, > 256 * sizeof(uint32_t)) < 0) > return error("commit-graph oid fanout chunk is wrong size"); > > I dunno. It's hard to have a more concrete recomendation without having > read the rest of the series. So it's possible that this is just complete > nonsense ;-). But my hunch is that there are a number of callers that > would benefit from having this built in. I don't think it's nonsense, and I do think other callers would benefit. On the other hand, I kind of like the notion that there is a complete validation callback for each of these chunks. Even though it just checks the size for now, it could handle other things. In the case of OIDF, for example, we can check whether the entries are monotonic. It's just that we happen to do those checks elsewhere. Hmm, actually, looking at that again, I think I may have missed a case in patch 6. For pack .idx files, we check the fanout table when they are loaded. And patch 6 adds the same for commit-graph files. I thought midx files were handled the same .idx, but looking at it again, I only see the monotonicity check in the "multi-pack-index verify" code paths. So it might need the same treatment. I'm not sure how I missed that (I started by making a corrupted midx first and couldn't get it to fail, which is when I discovered the existing checks, but maybe I am mixing up .idx and midx in my memory). -Peff
diff --git a/commit-graph.c b/commit-graph.c index a689a55b79..9b3b01da61 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -305,6 +305,16 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 0; } +static int graph_read_oid_fanout(const unsigned char *chunk_start, + size_t chunk_size, void *data) +{ + struct commit_graph *g = data; + if (chunk_size != 256 * sizeof(uint32_t)) + return error("commit-graph oid fanout chunk is wrong size"); + g->chunk_oid_fanout = (const uint32_t *)chunk_start; + return 0; +} + static int graph_read_oid_lookup(const unsigned char *chunk_start, size_t chunk_size, void *data) { @@ -394,8 +404,7 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, GRAPH_HEADER_SIZE, graph->num_chunks)) goto free_and_return; - pair_chunk_unsafe(cf, GRAPH_CHUNKID_OIDFANOUT, - (const unsigned char **)&graph->chunk_oid_fanout); + read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); pair_chunk_unsafe(cf, GRAPH_CHUNKID_DATA, &graph->chunk_commit_data); pair_chunk_unsafe(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges); diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index ba65f17dd9..d25bea3ec5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -2,6 +2,7 @@ test_description='commit graph' . ./test-lib.sh +. "$TEST_DIRECTORY"/lib-chunk.sh GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0 @@ -821,4 +822,29 @@ test_expect_success 'overflow during generation version upgrade' ' ) ' +corrupt_chunk () { + graph=full/.git/objects/info/commit-graph && + test_when_finished "rm -rf $graph" && + git -C full commit-graph write --reachable && + corrupt_chunk_file $graph "$@" +} + +check_corrupt_chunk () { + corrupt_chunk "$@" && + git -C full -c core.commitGraph=false log >expect.out && + git -C full -c core.commitGraph=true log >out 2>err && + test_cmp expect.out out +} + +test_expect_success 'reader notices too-small oid fanout chunk' ' + # make it big enough that the graph file is plausible, + # otherwise we hit an earlier check + check_corrupt_chunk OIDF clear $(printf "000000%02x" $(test_seq 250)) && + cat >expect.err <<-\EOF && + error: commit-graph oid fanout chunk is wrong size + error: commit-graph is missing the OID Fanout chunk + EOF + test_cmp expect.err err +' + test_done
We load the oid fanout chunk with pair_chunk(), which means we never see the size of the chunk. We just assume the on-disk file uses the appropriate size, and if it's too small we'll access random memory. It's easy to check this up-front; the fanout always consists of 256 uint32's, since it is a fanout of the first byte of the hash pointing into the oid index. These parameters can't be changed without introducing a new chunk type. This matches the similar check in the midx OIDF chunk (but note that rather than checking for the error immediately, the graph code just leaves parts of the struct NULL and checks for required fields later). Signed-off-by: Jeff King <peff@peff.net> --- commit-graph.c | 13 +++++++++++-- t/t5318-commit-graph.sh | 26 ++++++++++++++++++++++++++ 2 files changed, 37 insertions(+), 2 deletions(-)