Message ID | 20231109071711.GE2698043@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 8bd40ed2aed2baee235299cdbf8482c752f980a3 |
Headers | show |
Series | some more chunk-file bounds-checks fixes | expand |
On Thu, Nov 09, 2023 at 02:17:11AM -0500, Jeff King wrote: > The code to read commit-graph files tries to read all of the required > chunks, but doesn't abort if we can't find one (or if it's corrupted). > It's only at the end of reading the file that we then do some sanity > checks for NULL entries. But it's preferable to detect the errors and > bail immediately, for a few reasons: > > 1. It's less error-prone. It's easy in the reader functions to flag an > error but still end up setting some struct fields (an error I in > fact made while working on this patch series). > > 2. It's safer. Since verifying some chunks depends on the values of > other chunks, we may be depending on not-yet-verified data. I don't > know offhand of any case where this can cause problems, but it's > one less subtle thing to worry about in the reader code. > > 3. It prevents the user from seeing nonsense errors. If we're missing > an OIDL chunk, then g->num_commits will be zero. And so we may > complain that the size of our CDAT chunk (which should have a > fixed-size record for each commit) is wrong unless it's also zero. > But that's misleading; the problem is the missing OIDL chunk; the > CDAT one might be fine! > > So let's just check the return value from read_chunk(). This is exactly > how the midx chunk-reading code does it. All very well explained. I hit that same snag as you did when I was working on the few patches I proposed we put on top of your earlier chunk-format hardening series. I'm glad to see this getting cleaned up, and I'm very happy with the post-image of this patch. Thanks, Taylor
diff --git a/commit-graph.c b/commit-graph.c index 989ebbe816..374575b484 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -291,19 +291,6 @@ static int verify_commit_graph_lite(struct commit_graph *g) * over g->num_commits, or runs a checksum on the commit-graph * itself. */ - if (!g->chunk_oid_fanout) { - error(_("commit-graph required OID fanout chunk missing or corrupted")); - return 1; - } - if (!g->chunk_oid_lookup) { - error(_("commit-graph required OID lookup chunk missing or corrupted")); - return 1; - } - if (!g->chunk_commit_data) { - error(_("commit-graph required commit data chunk missing or corrupted")); - return 1; - } - for (i = 0; i < 255; i++) { uint32_t oid_fanout1 = ntohl(g->chunk_oid_fanout[i]); uint32_t oid_fanout2 = ntohl(g->chunk_oid_fanout[i + 1]); @@ -462,9 +449,19 @@ struct commit_graph *parse_commit_graph(struct repo_settings *s, GRAPH_HEADER_SIZE, graph->num_chunks, 1)) goto free_and_return; - read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph); - read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph); - read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph); + if (read_chunk(cf, GRAPH_CHUNKID_OIDFANOUT, graph_read_oid_fanout, graph)) { + error(_("commit-graph required OID fanout chunk missing or corrupted")); + goto free_and_return; + } + if (read_chunk(cf, GRAPH_CHUNKID_OIDLOOKUP, graph_read_oid_lookup, graph)) { + error(_("commit-graph required OID lookup chunk missing or corrupted")); + goto free_and_return; + } + if (read_chunk(cf, GRAPH_CHUNKID_DATA, graph_read_commit_data, graph)) { + error(_("commit-graph required commit data chunk missing or corrupted")); + goto free_and_return; + } + pair_chunk(cf, GRAPH_CHUNKID_EXTRAEDGES, &graph->chunk_extra_edges, &graph->chunk_extra_edges_size); pair_chunk(cf, GRAPH_CHUNKID_BASE, &graph->chunk_base_graphs,
The code to read commit-graph files tries to read all of the required chunks, but doesn't abort if we can't find one (or if it's corrupted). It's only at the end of reading the file that we then do some sanity checks for NULL entries. But it's preferable to detect the errors and bail immediately, for a few reasons: 1. It's less error-prone. It's easy in the reader functions to flag an error but still end up setting some struct fields (an error I in fact made while working on this patch series). 2. It's safer. Since verifying some chunks depends on the values of other chunks, we may be depending on not-yet-verified data. I don't know offhand of any case where this can cause problems, but it's one less subtle thing to worry about in the reader code. 3. It prevents the user from seeing nonsense errors. If we're missing an OIDL chunk, then g->num_commits will be zero. And so we may complain that the size of our CDAT chunk (which should have a fixed-size record for each commit) is wrong unless it's also zero. But that's misleading; the problem is the missing OIDL chunk; the CDAT one might be fine! So let's just check the return value from read_chunk(). This is exactly how the midx chunk-reading code does it. Signed-off-by: Jeff King <peff@peff.net> --- commit-graph.c | 29 +++++++++++++---------------- 1 file changed, 13 insertions(+), 16 deletions(-)