[14/20] commit-graph: bounds-check base graphs chunk

Message ID	20231009210541.GN3282181@coredump.intra.peff.net (mailing list archive)
State	Accepted
Commit	6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344
Headers	show Return-Path: <git-owner@vger.kernel.org> Date: Mon, 9 Oct 2023 17:05:41 -0400 From: Jeff King <peff@peff.net> To: git@vger.kernel.org Cc: Taylor Blau <me@ttaylorr.com> Subject: [PATCH 14/20] commit-graph: bounds-check base graphs chunk Message-ID: <20231009210541.GN3282181@coredump.intra.peff.net> References: <20231009205544.GA3281950@coredump.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net> Precedence: bulk
Series	bounds-checks for chunk-based files \| expand [0/20] bounds-checks for chunk-based files [01/20] chunk-format: note that pair_chunk() is unsafe [02/20] t: add library for munging chunk-format files [03/20] midx: stop ignoring malformed oid fanout chunk [04/20] commit-graph: check size of oid fanout chunk [05/20] midx: check size of oid lookup chunk [06/20] commit-graph: check consistency of fanout table [07/20] midx: check size of pack names chunk [08/20] midx: enforce chunk alignment on reading [09/20] midx: check size of object offset chunk [10/20] midx: bounds-check large offset chunk [11/20] midx: check size of revindex chunk [12/20] commit-graph: check size of commit data chunk [13/20] commit-graph: detect out-of-bounds extra-edges pointers [14/20] commit-graph: bounds-check base graphs chunk [15/20] commit-graph: check size of generations chunk [16/20] commit-graph: bounds-check generation overflow chunk [17/20] commit-graph: check bounds when accessing BDAT chunk [18/20] commit-graph: check bounds when accessing BIDX chunk [19/20] commit-graph: detect out-of-order BIDX offsets [20/20] chunk-format: drop pair_chunk_unsafe() [21/20] t5319: make corrupted large-offset test more robust

Message ID

20231009210541.GN3282181@coredump.intra.peff.net (mailing list archive)

State

Accepted

Commit

6cf61d0db55291c3b8406a6ba8f20fdfb9a4a344

Headers

Date: Mon, 9 Oct 2023 17:05:41 -0400
From: Jeff King <peff@peff.net>
To: git@vger.kernel.org
Cc: Taylor Blau <me@ttaylorr.com>
Subject: [PATCH 14/20] commit-graph: bounds-check base graphs chunk
Message-ID: <20231009210541.GN3282181@coredump.intra.peff.net>
References: <20231009205544.GA3281950@coredump.intra.peff.net>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
In-Reply-To: <20231009205544.GA3281950@coredump.intra.peff.net>
Precedence: bulk

Series

bounds-checks for chunk-based files | expand

Commit Message

Jeff King Oct. 9, 2023, 9:05 p.m. UTC

When we are loading a commit-graph chain, we check that each slice of the
chain points to the appropriate set of base graphs via its BASE chunk.
But since we don't record the size of the chunk, we may access
out-of-bounds memory if the file is corrupted.

Since we know the number of entries we expect to find (based on the
position within the commit-graph-chain file), we can just check the size
up front.

In theory this would also let us drop the st_mult() call a few lines
later when we actually access the memory, since we know that the
computed offset will fit in a size_t. But because the operands
"g->hash_len" and "n" have types "unsigned char" and "int", we'd have to
cast to size_t first. Leaving the st_mult() does that cast, and makes it
more obvious that we don't have an overflow problem.

Note that the test does not actually segfault before this patch, since
it just reads garbage from the chunk after BASE (and indeed, it even
rejects the file because that garbage does not have the expected hash
value). You could construct a file with BASE at the end that did
segfault, but corrupting the existing one is easy, and we can check
stderr for the expected message.

Signed-off-by: Jeff King <peff@peff.net>
---
 commit-graph.c                |  8 +++++++-
 commit-graph.h                |  1 +
 t/t5324-split-commit-graph.sh | 14 ++++++++++++++
 3 files changed, 22 insertions(+), 1 deletion(-)

Comments

Taylor Blau Oct. 11, 2023, 7:05 p.m. UTC | #1

On Mon, Oct 09, 2023 at 05:05:41PM -0400, Jeff King wrote:
> When we are loading a commit-graph chain, we check that each slice of the
> chain points to the appropriate set of base graphs via its BASE chunk.
> But since we don't record the size of the chunk, we may access
> out-of-bounds memory if the file is corrupted.
>
> Since we know the number of entries we expect to find (based on the
> position within the commit-graph-chain file), we can just check the size
> up front.
>
> In theory this would also let us drop the st_mult() call a few lines
> later when we actually access the memory, since we know that the
> computed offset will fit in a size_t. But because the operands
> "g->hash_len" and "n" have types "unsigned char" and "int", we'd have to
> cast to size_t first. Leaving the st_mult() does that cast, and makes it
> more obvious that we don't have an overflow problem.
>
> Note that the test does not actually segfault before this patch, since
> it just reads garbage from the chunk after BASE (and indeed, it even
> rejects the file because that garbage does not have the expected hash
> value). You could construct a file with BASE at the end that did
> segfault, but corrupting the existing one is easy, and we can check
> stderr for the expected message.
>
> Signed-off-by: Jeff King <peff@peff.net>
> ---
>  commit-graph.c                |  8 +++++++-
>  commit-graph.h                |  1 +
>  t/t5324-split-commit-graph.sh | 14 ++++++++++++++
>  3 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/commit-graph.c b/commit-graph.c
> index e4860841fc..4377b547c8 100644
> --- a/commit-graph.c
> +++ b/commit-graph.c
> @@ -435,7 +435,8 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s,
>  	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
>  	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges,
>  		   &graph->chunk_extra_edges_size);
> -	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
> +	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
> +		   &graph->chunk_base_graphs_size);
>
>  	if (s->commit_graph_generation_version >= 2) {
>  		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA,
> @@ -546,6 +547,11 @@ static int add_graph_to_chain(struct commit_graph *g,
>  		return 0;
>  	}
>
> +	if (g->chunk_base_graphs_size / g->hash_len < n) {
> +		warning(_("commit-graph base graphs chunk is too small"));
> +		return 0;
> +	}
> +

Nice. Here's a spot where we would not benefit from a function like
`pair_chunk_expect()`, since we don't know about the chain when we are
parsing an individual layer of it. So storing the length off to the side
and checking it within `add_graph_to_chain()` makes sense.

Thanks,
Taylor

diff --git a/commit-graph.c b/commit-graph.c
index e4860841fc..4377b547c8 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -435,7 +435,8 @@  struct commit_graph *parse_commit_graph(struct repo_settings *s,
 	read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph);
 	pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges,
 		   &graph->chunk_extra_edges_size);
-	pair_chunk_unsafe(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs);
+	pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
+		   &graph->chunk_base_graphs_size);
 
 	if (s->commit_graph_generation_version >= 2) {
 		pair_chunk_unsafe(cf, GRAPH_CHUNKID_GENERATION_DATA,
@@ -546,6 +547,11 @@  static int add_graph_to_chain(struct commit_graph *g,
 		return 0;
 	}
 
+	if (g->chunk_base_graphs_size / g->hash_len < n) {
+		warning(_("commit-graph base graphs chunk is too small"));
+		return 0;
+	}
+
 	while (n) {
 		n--;
 
diff --git a/commit-graph.h b/commit-graph.h
index 1f8a9de4fb..e4248ea05d 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -97,6 +97,7 @@  struct commit_graph {
 	const unsigned char *chunk_extra_edges;
 	size_t chunk_extra_edges_size;
 	const unsigned char *chunk_base_graphs;
+	size_t chunk_base_graphs_size;
 	const unsigned char *chunk_bloom_indexes;
 	const unsigned char *chunk_bloom_data;
 
diff --git a/t/t5324-split-commit-graph.sh b/t/t5324-split-commit-graph.sh
index 55b5765e2d..3c8482d073 100755
--- a/t/t5324-split-commit-graph.sh
+++ b/t/t5324-split-commit-graph.sh
@@ -2,6 +2,7 @@ 
 
 test_description='split commit graph'
 . ./test-lib.sh
+. "$TEST_DIRECTORY"/lib-chunk.sh
 
 GIT_TEST_COMMIT_GRAPH=0
 GIT_TEST_COMMIT_GRAPH_CHANGED_PATHS=0
@@ -398,6 +399,19 @@  test_expect_success 'verify across alternates' '
 	)
 '
 
+test_expect_success 'reader bounds-checks base-graph chunk' '
+	git clone --no-hardlinks . corrupt-base-chunk &&
+	(
+		cd corrupt-base-chunk &&
+		tip_file=$graphdir/graph-$(tail -n 1 $graphdir/commit-graph-chain).graph &&
+		corrupt_chunk_file "$tip_file" BASE clear 01020304 &&
+		git -c core.commitGraph=false log >expect.out &&
+		git -c core.commitGraph=true log >out 2>err &&
+		test_cmp expect.out out &&
+		grep "commit-graph base graphs chunk is too small" err
+	)
+'
+
 test_expect_success 'add octopus merge' '
 	git reset --hard commits/10 &&
 	git merge commits/3 commits/4 &&

[14/20] commit-graph: bounds-check base graphs chunk

Commit Message

Comments

Patch