diff mbox series

[14/20] commit-graph: bounds-check base graphs chunk

Message ID 20231009210541.GN3282181@coredump.intra.peff.net (mailing list archive)
State Accepted
Commit 6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344
Headers show
Series bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC
When we are loading a commit-graph chain, we check that each slice of the
chain points to the appropriate set of base graphs via its BASE chunk.
But since we don't record the size of the chunk, we may access
out-of-bounds memory if the file is corrupted.

Since we know the number of entries we expect to find (based on the
position within the commit-graph-chain file), we can just check the size
up front.

In theory this would also let us drop the st_mult() call a few lines
later when we actually access the memory, since we know that the
computed offset will fit in a size_t. But because the operands
"g->hash_len" and "n" have types "unsigned char" and "int", we'd have to
cast to size_t first. Leaving the st_mult() does that cast, and makes it
more obvious that we don't have an overflow problem.

Note that the test does not actually segfault before this patch, since
it just reads garbage from the chunk after BASE (and indeed, it even
rejects the file because that garbage does not have the expected hash
value). You could construct a file with BASE at the end that did
segfault, but corrupting the existing one is easy, and we can check
stderr for the expected message.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c                |  8 +++++++-
 commit-graph.h                |  1 +
 t/t5324-split-commit-graph.sh | 14 ++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

Comments

Taylor Blau Oct. 11, 2023, 7:05 p.m. UTC | #1
On Mon, Oct 09, 2023 at 05:05:41PM -0400, Jeff King wrote:
> When we are loading a commit-graph chain, we check that each slice of the
> chain points to the appropriate set of base graphs via its BASE chunk.
> But since we don't record the size of the chunk, we may access
> out-of-bounds memory if the file is corrupted.
>
> Since we know the number of entries we expect to find (based on the
> position within the commit-graph-chain file), we can just check the size
> up front.
>
> In theory this would also let us drop the st_mult() call a few lines
> later when we actually access the memory, since we know that the
> computed offset will fit in a size_t. But because the operands
> "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to
> cast to size_t first. Leaving the st_mult() does that cast, and makes it
> more obvious that we don't have an overflow problem.
>
> Note that the test does not actually segfault before this patch, since
> it just reads garbage from the chunk after BASE (and indeed, it even
> rejects the file because that garbage does not have the expected hash
> value). You could construct a file with BASE at the end that did
> segfault, but corrupting the existing one is easy, and we can check
> stderr for the expected message.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c                |  8 +++++++-
>  commit-graph.h                |  1 +
>  t/t5324-split-commit-graph.sh | 14 ++++++++++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index e4860841fc..4377b547c8 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -435,7 +435,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>  	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
>  	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges,
>  		   &graph->chunk_extra_edges_size);
> -	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
> +	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
> +		   &graph->chunk_base_graphs_size);
>
>  	if (s->commit_graph_generation_version >= 2) {
>  		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA,
> @@ -546,6 +547,11 @@ static int add_graph_to_chain(struct commit_graph *g,
>  		return 0;
>  	}
>
> +	if (g->chunk_base_graphs_size / g->hash_len < n) {
> +		warning(_("commit-graph base graphs chunk is too small"));
> +		return 0;
> +	}
> +

Nice. Here's a spot where we would not benefit from a function like
`pair_chunk_expect()`, since we don't know about the chain when we are
parsing an individual layer of it. So storing the length off to the side
and checking it within `add_graph_to_chain()` makes sense.

Thanks,
Taylor
diff mbox series

Patch

diff --git a/commit-graph.c b/commit-graph.c
index e4860841fc..4377b547c8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -435,7 +435,8 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges,
 		   &graph->chunk_extra_edges_size);
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
+	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
+		   &graph->chunk_base_graphs_size);
 
 	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA,
@@ -546,6 +547,11 @@  static int add_graph_to_chain(struct commit_graph *g,
 		return 0;
 	}
 
+	if (g->chunk_base_graphs_size / g->hash_len < n) {
+		warning(_("commit-graph base graphs chunk is too small"));
+		return 0;
+	}
+
 	while (n) {
 		n--;
 
diff --git a/commit-graph.h b/commit-graph.h
index 1f8a9de4fb..e4248ea05d 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -97,6 +97,7 @@  struct commit_graph {
 	const unsigned char *chunk_extra_edges;
 	size_t chunk_extra_edges_size;
 	const unsigned char *chunk_base_graphs;
+	size_t chunk_base_graphs_size;
 	const unsigned char *chunk_bloom_indexes;
 	const unsigned char *chunk_bloom_data;
 
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 55b5765e2d..3c8482d073 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -2,6 +2,7 @@ 
 
 test_description='split commit graph'
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_COMMIT_GRAPH=0
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
@@ -398,6 +399,19 @@  test_expect_success 'verify across alternates' '
 	)
 '
 
+test_expect_success 'reader bounds-checks base-graph chunk' '
+	git clone --no-hardlinks . corrupt-base-chunk &&
+	(
+		cd corrupt-base-chunk &&
+		tip_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph &&
+		corrupt_chunk_file "$tip_file" BASE clear 01020304 &&
+		git -c core.commitGraph=false log >expect.out &&
+		git -c core.commitGraph=true log >out 2>err &&
+		test_cmp expect.out out &&
+		grep "commit-graph base graphs chunk is too small" err
+	)
+'
+
 test_expect_success 'add octopus merge' '
 	git reset --hard commits/10 &&
 	git merge commits/3 commits/4 &&