Message ID | 20231109072435.GF2698043@coredump.intra.peff.net (mailing list archive) |
---|---|
State | Accepted |
Commit | d3b6f6c63137b72df5055b71721825e786bcbd6e |
Headers | show |
Series | some more chunk-file bounds-checks fixes | expand |
On Thu, Nov 09, 2023 at 02:24:35AM -0500, Jeff King wrote: > @@ -323,7 +320,8 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, > { > struct commit_graph *g = data; > g->chunk_oid_lookup = chunk_start; > - g->num_commits = chunk_size / g->hash_len; > + if (chunk_size / g->hash_len != g->num_commits) > + return error(_("commit-graph OID lookup chunk is the wrong size")); > return 0; > } My understanding is that you need this error message to come from graph_read_oid_lookup() in order to pass the "detect incorrect fanout final value" test, but I wish that we didn't have to, since having the more-or-less duplicate error messages in the latter "reader notices fanout/lookup table mismatch" is somewhat unfortunate. > diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh > index affb959d64..8bd7fcefb5 100755 > --- a/t/t5318-commit-graph.sh > +++ b/t/t5318-commit-graph.sh > @@ -560,7 +560,7 @@ test_expect_success 'detect incorrect fanout' ' > > test_expect_success 'detect incorrect fanout final value' ' > corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ > - "oid table and fanout disagree on size" > + "OID lookup chunk is the wrong size" > ' > > test_expect_success 'detect incorrect OID order' ' > @@ -850,7 +850,8 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' > test_expect_success 'reader notices fanout/lookup table mismatch' ' > check_corrupt_chunk OIDF 1020 "FFFFFFFF" && > cat >expect.err <<-\EOF && > - error: commit-graph oid table and fanout disagree on size > + error: commit-graph OID lookup chunk is the wrong size > + error: commit-graph required OID lookup chunk missing or corrupted > EOF > test_cmp expect.err err > ' > -- > 2.43.0.rc1.572.g273fc7bed6 > Thanks, Taylor
On Thu, Nov 09, 2023 at 04:20:53PM -0500, Taylor Blau wrote: > On Thu, Nov 09, 2023 at 02:24:35AM -0500, Jeff King wrote: > > @@ -323,7 +320,8 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, > > { > > struct commit_graph *g = data; > > g->chunk_oid_lookup = chunk_start; > > - g->num_commits = chunk_size / g->hash_len; > > + if (chunk_size / g->hash_len != g->num_commits) > > + return error(_("commit-graph OID lookup chunk is the wrong size")); > > return 0; > > } > > My understanding is that you need this error message to come from > graph_read_oid_lookup() in order to pass the "detect incorrect fanout > final value" test, but I wish that we didn't have to, since having the > more-or-less duplicate error messages in the latter "reader notices > fanout/lookup table mismatch" is somewhat unfortunate. I'm not sure what you mean here. We notice the problem in the reading code, which is used in turn by the verify code. So both of these tests: > > test_expect_success 'detect incorrect fanout final value' ' > > corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ > > - "oid table and fanout disagree on size" > > + "OID lookup chunk is the wrong size" > > ' > > > > test_expect_success 'detect incorrect OID order' ' > > @@ -850,7 +850,8 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' > > test_expect_success 'reader notices fanout/lookup table mismatch' ' > > check_corrupt_chunk OIDF 1020 "FFFFFFFF" && > > cat >expect.err <<-\EOF && > > - error: commit-graph oid table and fanout disagree on size > > + error: commit-graph OID lookup chunk is the wrong size > > + error: commit-graph required OID lookup chunk missing or corrupted > > EOF > > test_cmp expect.err err > > ' will see a message regardless of where we put it. Or by "duplicate" did you mean the two-line: > > + error: commit-graph OID lookup chunk is the wrong size > > + error: commit-graph required OID lookup chunk missing or corrupted output. That's because our chunk of the return value from read_chunk() is a bit lazy, and does not distinguish "missing chunk" (where we should write a string saying so) from errors produced by the callback (where a more specific error message has already been written). That's true of all of the read_chunk() callers, including the midx ones. This is one of the ways that pair_chunk_expect() could do better without adding a lot of code, because it can check the return value from read_chunk(). It doesn't help the other cases (like OIDF) that still have to call read_chunk() themselves, though. Possibly read_chunk() should just take a flag to indicate that it should complain when the chunk is missing. And then callers could just do: if (read_chunk(cf, id, our_callback, CHUNK_REQUIRED) return -1; /* no need to complain; either our_callback() did, or read_chunk() itself */ -Peff
On Thu, Nov 09, 2023 at 04:38:13PM -0500, Jeff King wrote: > Or by "duplicate" did you mean the two-line: > > > > + error: commit-graph OID lookup chunk is the wrong size > > > + error: commit-graph required OID lookup chunk missing or corrupted > > output. That's because our chunk of the return value from read_chunk() > is a bit lazy, and does not distinguish "missing chunk" (where we should > write a string saying so) from errors produced by the callback (where a > more specific error message has already been written). That's true of > all of the read_chunk() callers, including the midx ones. Yeah, I meant duplicate in the sense above. > This is one of the ways that pair_chunk_expect() could do better without > adding a lot of code, because it can check the return value from > read_chunk(). It doesn't help the other cases (like OIDF) that still > have to call read_chunk() themselves, though. Possibly read_chunk() > should just take a flag to indicate that it should complain when the > chunk is missing. And then callers could just do: > > if (read_chunk(cf, id, our_callback, CHUNK_REQUIRED) > return -1; /* no need to complain; either our_callback() did, or > read_chunk() itself */ We do return CHUNK_NOT_FOUND when we have a missing chunk, which we could check for individually. But TBH, I don't find the first error all that useful. In this instance, there's really only one way for the OIDL chunk to be corrupt, which is that it has the wrong size. And short of making the error much more robust, e.g.: error: commit-graph OID lookup chunk is the wrong size (got: $X, wanted: $Y) and dropping the generic "missing or corrupt" error, I think that just the generic error is fine here. > -Peff Thanks, Taylor
On Thu, Nov 09, 2023 at 05:15:05PM -0500, Taylor Blau wrote: > > This is one of the ways that pair_chunk_expect() could do better without > > adding a lot of code, because it can check the return value from > > read_chunk(). It doesn't help the other cases (like OIDF) that still > > have to call read_chunk() themselves, though. Possibly read_chunk() > > should just take a flag to indicate that it should complain when the > > chunk is missing. And then callers could just do: > > > > if (read_chunk(cf, id, our_callback, CHUNK_REQUIRED) > > return -1; /* no need to complain; either our_callback() did, or > > read_chunk() itself */ > > We do return CHUNK_NOT_FOUND when we have a missing chunk, which we > could check for individually. But TBH, I don't find the first error all > that useful. In this instance, there's really only one way for the OIDL > chunk to be corrupt, which is that it has the wrong size. But aren't there two ways it can go wrong? It can be the wrong size, or it can be missing. In the first we say: error: wrong size error: missing or corrupted and in the second we say: error: missing or corrupted Which is why I think issuing a message from the callback has value. Of course the ideal would be: error: wrong size and: error: missing but I didn't think it was worth making the code much longer for an error condition we don't really expect anybody to see in practice. And also... > And short of making the error much more robust, e.g.: > > error: commit-graph OID lookup chunk is the wrong size (got: $X, wanted: $Y) ...yes, I think that would be worth doing, especially if you are centralizing the error messages in pair_chunk_expect(). But your series goes the opposite direction. > and dropping the generic "missing or corrupt" error, I think that just > the generic error is fine here. If you drop that error, then who will warn about the missing-chunk case? The user would see nothing at all. -Peff
diff --git a/commit-graph.c b/commit-graph.c index 374575b484..094814c2ba 100644 --- a/commit-graph.c +++ b/commit-graph.c @@ -300,10 +300,6 @@ static int verify_commit_graph_lite(struct commit_graph *g) return 1; } } - if (ntohl(g->chunk_oid_fanout[255]) != g->num_commits) { - error("commit-graph oid table and fanout disagree on size"); - return 1; - } return 0; } @@ -315,6 +311,7 @@ static int graph_read_oid_fanout(const unsigned char *chunk_start, if (chunk_size != 256 * sizeof(uint32_t)) return error("commit-graph oid fanout chunk is wrong size"); g->chunk_oid_fanout = (const uint32_t *)chunk_start; + g->num_commits = ntohl(g->chunk_oid_fanout[255]); return 0; } @@ -323,7 +320,8 @@ static int graph_read_oid_lookup(const unsigned char *chunk_start, { struct commit_graph *g = data; g->chunk_oid_lookup = chunk_start; - g->num_commits = chunk_size / g->hash_len; + if (chunk_size / g->hash_len != g->num_commits) + return error(_("commit-graph OID lookup chunk is the wrong size")); return 0; } diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh index affb959d64..8bd7fcefb5 100755 --- a/t/t5318-commit-graph.sh +++ b/t/t5318-commit-graph.sh @@ -560,7 +560,7 @@ test_expect_success 'detect incorrect fanout' ' test_expect_success 'detect incorrect fanout final value' ' corrupt_graph_and_verify $GRAPH_BYTE_FANOUT2 "\01" \ - "oid table and fanout disagree on size" + "OID lookup chunk is the wrong size" ' test_expect_success 'detect incorrect OID order' ' @@ -850,7 +850,8 @@ test_expect_success 'reader notices too-small oid fanout chunk' ' test_expect_success 'reader notices fanout/lookup table mismatch' ' check_corrupt_chunk OIDF 1020 "FFFFFFFF" && cat >expect.err <<-\EOF && - error: commit-graph oid table and fanout disagree on size + error: commit-graph OID lookup chunk is the wrong size + error: commit-graph required OID lookup chunk missing or corrupted EOF test_cmp expect.err err '
Commit-graph, midx, and pack idx files all have both a lookup table of oids and an oid fanout table. In midx and pack idx files, we take the final entry of the fanout table as the source of truth for the number of entries, and then verify that the size of the lookup table matches that. But for commit-graph files, we do the opposite: we use the size of the lookup table as the source of truth, and then check the final fanout entry against it. As noted in 4169d89645 (commit-graph: check consistency of fanout table, 2023-10-09), either is correct. But there are a few reasons to prefer the fanout table as the source of truth: 1. The fanout entries are 32-bits on disk, and that defines the maximum number of entries we can store. But since the size of the lookup table is only bounded by the filesystem, it can be much larger. And hence computing it as the commit-graph does means that we may truncate the result when storing it in a uint32_t. 2. We read the fanout first, then the lookup table. If we're verifying the chunks as we read them, then we'd want to take the fanout as truth (we have nothing yet to check it against) and then we can check that the lookup table matches what we already know. 3. It is pointlessly inconsistent with the midx and pack idx code. Since the three have to do similar size and bounds checks, it is easier to reason about all three if they use the same approach. So this patch moves the assignment of g->num_commits to the fanout parser, and then we can check the size of the lookup chunk as soon as we try to load it. There's already a test covering this situation, which munges the final fanout entry to 2^32-1. In the current code we complain that it does not agree with the table size. But now that we treat the munged value as the source of truth, we'll complain that the lookup table is the wrong size (again, either is correct). So we'll have to update the message we expect (and likewise for an earlier test which does similar munging). There's a similar test for this situation on the midx side, but rather than making a very-large fanout value, it just truncates the lookup table. We could do that here, too, but the very-large fanout value actually shows an interesting corner case. On a 32-bit system, multiplying to find the expected table size would cause an integer overflow. Using st_mult() would detect that, but cause us to die() rather than falling back to the non-graph code path. Checking the size using division (as we do with existing chunk-size checks) avoids the overflow entirely, and the test demonstrates this when run on a 32-bit system. Signed-off-by: Jeff King <peff@peff.net> --- commit-graph.c | 8 +++----- t/t5318-commit-graph.sh | 5 +++-- 2 files changed, 6 insertions(+), 7 deletions(-)