Message ID | 20231109070948.GA2698043@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | 4bc6d43271e9b5553135f6ae90f6634473077721 |
Headers | show |
Series | some more chunk-file bounds-checks fixes | expand |
On Thu, Nov 09, 2023 at 02:09:48AM -0500, Jeff King wrote: > So instead, we can do a division like this: > > if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) > > where there's no possibility of overflow. We do lose a little bit of > precision; due to integer division truncation we'd allow up to an extra > GRAPH_DATA_WIDTH-1 bytes of data in the chunk. That's OK. Our main goal > here is making sure we don't have too _few_ bytes, which would cause an > out-of-bounds read (we could actually replace our "!=" with "<", but I > think it's worth being a little pedantic, as a large mismatch could be a > sign of other problems). This is wonderfully explained, and the patch below follows trivially from what you wrote here. So everything in this patch makes sense and looks good to me. It does make me think about the pair_chunk_expect() function that I proposed elsewhere. I haven't yet read the rest of the series, so it may be a totally useless direction by the end of this series ;-). But I wonder if the interface we want is something like: int pair_chunk_expect(struct chunkfile *cf, uint32_t chunk_id, const unsigned char **p, size_t record_size, size_t record_nr); So we can then grab the size of the chunk, divide it by "record_size" and ensure that end up with "record_nr" as a result. Again, this is perhaps totally useless by the end of your series, but just having looked at the first patch I wonder if this is a productive direction to consider... Thanks, Taylor
On Thu, Nov 09, 2023 at 04:13:17PM -0500, Taylor Blau wrote: > So everything in this patch makes sense and looks good to me. It does > make me think about the pair_chunk_expect() function that I proposed > elsewhere. I haven't yet read the rest of the series, so it may be a > totally useless direction by the end of this series ;-). Nope, it's not useless. But I do think it affects what we'd want the interface to look like, and... > But I wonder if the interface we want is something like: > > int pair_chunk_expect(struct chunkfile *cf, uint32_t chunk_id, > const unsigned char **p, > size_t record_size, size_t record_nr); > > So we can then grab the size of the chunk, divide it by "record_size" > and ensure that end up with "record_nr" as a result. ...this is exactly the direction I was thinking it would go. So if you got to the same place after reading my explanation, then hopefully something went right. ;) -Peff
diff --git a/commit-graph.c b/commit-graph.c index ee66098e07..5d7d7a89e5 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -344,7 +344,7 @@ static int graph_read_commit_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) + if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) return error("commit-graph commit data chunk is wrong size"); g->chunk_commit_data = chunk_start; return 0; @@ -354,7 +354,7 @@ static int graph_read_generation_data(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * sizeof(uint32_t)) + if (chunk_size / sizeof(uint32_t) != g->num_commits) return error("commit-graph generations chunk is wrong size"); g->chunk_generation_data = chunk_start; return 0; @@ -364,7 +364,7 @@ static int graph_read_bloom_index(const unsigned char *chunk_start, size_t chunk_size, void *data) { struct commit_graph *g = data; - if (chunk_size != g->num_commits * 4) { + if (chunk_size / 4 != g->num_commits) { warning("commit-graph changed-path index chunk is too small"); return -1; }
We check the size of chunks with fixed records by multiplying the width of each record by the number of commits in the file. Like: if (chunk_size != g->num_commits * GRAPH_DATA_WIDTH) If this multiplication overflows, we may not notice a chunk is too small (which could later lead to out-of-bound reads). In the current code this is only possible for the CDAT chunk, but the reasons are quite subtle. We compute g->num_commits by dividing the size of the OIDL chunk by the hash length (since it consists of a bunch of hashes). So we know that any size_t multiplication that uses a value smaller than the hash length cannot overflow. And the CDAT records are the only ones that are larger (the others are just 4-byte records). So it's worth fixing all of these, to make it clear that they're not subject to overflow (without having to reason about seemingly unrelated code). The obvious thing to do is add an st_mult(), like: if (chunk_size != st_mult(g->num_commits, GRAPH_DATA_WIDTH)) And that certainly works, but it has one downside: if we detect an overflow, we'll immediately die(). But the commit graph is an optional file; if we run into other problems loading it, we'll generally return an error and fall back to accessing the full objects. Using st_mult() means a malformed file will abort the whole process. So instead, we can do a division like this: if (chunk_size / GRAPH_DATA_WIDTH != g->num_commits) where there's no possibility of overflow. We do lose a little bit of precision; due to integer division truncation we'd allow up to an extra GRAPH_DATA_WIDTH-1 bytes of data in the chunk. That's OK. Our main goal here is making sure we don't have too _few_ bytes, which would cause an out-of-bounds read (we could actually replace our "!=" with "<", but I think it's worth being a little pedantic, as a large mismatch could be a sign of other problems). I didn't add a test here. We'd need to generate a very large graph file in order to get g->num_commits large enough to cause an overflow. And a later patch in this series will use this same division technique in a way that is much easier to trigger in the tests. Signed-off-by: Jeff King <peff@peff.net> --- There are still several st_mult() calls within commit-graph.c, unrelated to my jk/chunk-bounds series. I suspect they're all redundant, as the chunk-size checks give a stricter bound. But checking and removing them is a separate topic. I think the midx code has similar st_mult() problems, but it's quite happy to die() on any error already. So I've left auditing that for another day. Another alternative to the division is introducing an st_mult() variant like: if (!st_mult(&out, &a, &b)) return error(...); We may want to go there in the long run as part of lib-ification, but I didn't want to get into it for this series. commit-graph.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-)