Message ID | 7db4ec3e327ed3695f4f5409cb2dc80c72688758.1681748502.git.gitgitgadget@gmail.com (mailing list archive) |
---|---|
State | Accepted |
Commit | d975fe1fa57d57cfd21a97f96f4a94b99f50f2f4 |
Headers | show |
Series | git fsck: check pack rev-index files | expand |
"Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > +corrupt_rev_and_verify () { > + ( > + pos="$1" && > +... > + grep "$error" err > + ) > +} Curious. Would it work equally well to write corrupt_rev_and_verify () ( pos="$1" && ... grep "$error" err ) without an extra level of indentation?
On Mon, Apr 17, 2023 at 04:21:39PM +0000, Derrick Stolee via GitGitGadget wrote: > @@ -309,6 +310,15 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) > */ > int verify_pack_revindex(struct packed_git *p) > { > + /* Do not bother checking if not initialized. */ Yep, makes sense; if we don't have an on-disk reverse index (which is mmap'd into `revindex_map` we don't have anything to verify), so we can bail here. > + if (!p->revindex_map) > + return 0; > + > + if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { > + error(_("invalid checksum")); > + return -1; > + } > + > return 0; > } > > diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh > index 206c412f50b..6b7c709a1f6 100755 > --- a/t/t5325-reverse-index.sh > +++ b/t/t5325-reverse-index.sh > @@ -145,4 +145,44 @@ test_expect_success 'fsck succeeds on good rev-index' ' > ) > ' > > +test_expect_success 'set up rev-index corruption tests' ' s/set up/setup for easy `--run`-ing (e.g., ./t5325-*.sh --run=setup,fsck). > + git init corrupt && > + ( > + cd corrupt && > + > + test_commit commit && > + git -c pack.writeReverseIndex=true repack -ad && > + > + revfile=$(ls .git/objects/pack/pack-*.rev) && > + chmod a+w $revfile && > + cp $revfile $revfile.bak > + ) > +' > + > +corrupt_rev_and_verify () { > + ( > + pos="$1" && > + value="$2" && > + error="$3" && > + > + cd corrupt && > + revfile=$(ls .git/objects/pack/pack-*.rev) && > + > + # Reset to original rev-file. > + cp $revfile.bak $revfile && > + > + printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc && > + test_must_fail git fsck 2>err && > + grep "$error" err > + ) > +} > + > +test_expect_success 'fsck catches invalid checksum' ' > + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && Would this test be tighter if we introduced a sub-shell and cd'd into "corrupt" here? > + orig_size=$(wc -c <$revfile) && I'm nitpicking, but we may want to use `test_file_size` here instead of `wc -c`. The latter outnumbers the former in terms of number of uses, but I think we consider `test_file_size` to be canonical these days. > + hashpos=$((orig_size - 10)) && > + corrupt_rev_and_verify $hashpos bogus \ > + "invalid checksum" > +' This looks good. Thanks, Taylor
On 4/17/2023 6:15 PM, Junio C Hamano wrote: > "Derrick Stolee via GitGitGadget" <gitgitgadget@gmail.com> writes: > >> +corrupt_rev_and_verify () { >> + ( >> + pos="$1" && >> +... >> + grep "$error" err >> + ) >> +} > > Curious. Would it work equally well to write > > corrupt_rev_and_verify () ( > pos="$1" && > ... > grep "$error" err > ) > > without an extra level of indentation? I've never seen that replacement (and it only exists in our tree in t4018/bash-arithmetic-function and t4018/bash-subshell-function) but it works and looks nicer. Thanks, -Stolee
On 4/17/2023 6:24 PM, Taylor Blau wrote: > On Mon, Apr 17, 2023 at 04:21:39PM +0000, Derrick Stolee via GitGitGadget wrote: >> +test_expect_success 'set up rev-index corruption tests' ' > > s/set up/setup for easy `--run`-ing (e.g., ./t5325-*.sh --run=setup,fsck). Makes sense. >> +test_expect_success 'fsck catches invalid checksum' ' >> + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && > > Would this test be tighter if we introduced a sub-shell and cd'd into > "corrupt" here? corrupt_rev_and_verify does the subshell thing. Why should we do that here in the test? >> + orig_size=$(wc -c <$revfile) && > > I'm nitpicking, but we may want to use `test_file_size` here instead of > `wc -c`. The latter outnumbers the former in terms of number of uses, > but I think we consider `test_file_size` to be canonical these days. Makes sense. Thanks, -Stolee
On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote: > >> +test_expect_success 'fsck catches invalid checksum' ' > >> + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && > > > > Would this test be tighter if we introduced a sub-shell and cd'd into > > "corrupt" here? > > corrupt_rev_and_verify does the subshell thing. Why should we do that > here in the test? I was thinking that it might be more concise if you moved the subshell to the test and out of corrupt_rev_and_verify. In addition to making corrupt_rev_and_verify work in other instances where the repository isn't required to be in a directory named "corrupt", I think it simplifies the result. Here's what I was thinking, as a diff on top of this patch: --- >8 --- diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 6b7c709a1f..7dfbaf6b37 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -160,29 +160,30 @@ test_expect_success 'set up rev-index corruption tests' ' ' corrupt_rev_and_verify () { - ( - pos="$1" && - value="$2" && - error="$3" && + pos="$1" && + value="$2" && + error="$3" && - cd corrupt && - revfile=$(ls .git/objects/pack/pack-*.rev) && + revfile=$(ls .git/objects/pack/pack-*.rev) && - # Reset to original rev-file. - cp $revfile.bak $revfile && + # Reset to original rev-file. + cp $revfile.bak $revfile && - printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc && - test_must_fail git fsck 2>err && - grep "$error" err - ) + printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc && + test_must_fail git fsck 2>err && + grep "$error" err } test_expect_success 'fsck catches invalid checksum' ' - revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && - orig_size=$(wc -c <$revfile) && - hashpos=$((orig_size - 10)) && - corrupt_rev_and_verify $hashpos bogus \ - "invalid checksum" + ( + cd corrupt && + + revfile=$(ls .git/objects/pack/pack-*.rev) && + orig_size=$(wc -c <$revfile) && + hashpos=$((orig_size - 10)) && + corrupt_rev_and_verify $hashpos bogus \ + "invalid checksum" + ) ' test_done --- 8< --- If you do take my suggestion, make sure to remember to come back in patches 3/4 and 4/4 and adjust those instances of 'corrupt_rev_and_verify' to first change into "corrupt". Thanks, Taylor
On 4/18/2023 10:51 AM, Taylor Blau wrote: > On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote: >>>> +test_expect_success 'fsck catches invalid checksum' ' >>>> + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && >>> >>> Would this test be tighter if we introduced a sub-shell and cd'd into >>> "corrupt" here? >> >> corrupt_rev_and_verify does the subshell thing. Why should we do that >> here in the test? > > I was thinking that it might be more concise if you moved the subshell > to the test and out of corrupt_rev_and_verify. In addition to making > corrupt_rev_and_verify work in other instances where the repository > isn't required to be in a directory named "corrupt", I think it > simplifies the result. I don't think there is a good reason to allow using a different repo name. This is the only test that requires doing anything but calling corrupt_rev_and_verify with different parameters, so I think this makes the test script at the end of the series noisier. Thanks, -Stolee
On Tue, Apr 18, 2023 at 10:57:15AM -0400, Derrick Stolee wrote: > On 4/18/2023 10:51 AM, Taylor Blau wrote: > > On Tue, Apr 18, 2023 at 10:27:57AM -0400, Derrick Stolee wrote: > >>>> +test_expect_success 'fsck catches invalid checksum' ' > >>>> + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && > >>> > >>> Would this test be tighter if we introduced a sub-shell and cd'd into > >>> "corrupt" here? > >> > >> corrupt_rev_and_verify does the subshell thing. Why should we do that > >> here in the test? > > > > I was thinking that it might be more concise if you moved the subshell > > to the test and out of corrupt_rev_and_verify. In addition to making > > corrupt_rev_and_verify work in other instances where the repository > > isn't required to be in a directory named "corrupt", I think it > > simplifies the result. > > I don't think there is a good reason to allow using a different repo > name. This is the only test that requires doing anything but calling > corrupt_rev_and_verify with different parameters, so I think this > makes the test script at the end of the series noisier. No worries. I was thinking that it might be convenient in the future if we wanted to corrupt a .rev file in a different repository, but that's absolutely a bridge we can cross if/when we get to it. Thanks, Taylor
diff --git a/pack-revindex.c b/pack-revindex.c index c3f2aaa3fea..007a806994f 100644 --- a/pack-revindex.c +++ b/pack-revindex.c @@ -5,6 +5,7 @@ #include "packfile.h" #include "config.h" #include "midx.h" +#include "csum-file.h" struct revindex_entry { off_t offset; @@ -309,6 +310,15 @@ int load_pack_revindex(struct repository *r, struct packed_git *p) */ int verify_pack_revindex(struct packed_git *p) { + /* Do not bother checking if not initialized. */ + if (!p->revindex_map) + return 0; + + if (!hashfile_checksum_valid((const unsigned char *)p->revindex_map, p->revindex_size)) { + error(_("invalid checksum")); + return -1; + } + return 0; } diff --git a/t/t5325-reverse-index.sh b/t/t5325-reverse-index.sh index 206c412f50b..6b7c709a1f6 100755 --- a/t/t5325-reverse-index.sh +++ b/t/t5325-reverse-index.sh @@ -145,4 +145,44 @@ test_expect_success 'fsck succeeds on good rev-index' ' ) ' +test_expect_success 'set up rev-index corruption tests' ' + git init corrupt && + ( + cd corrupt && + + test_commit commit && + git -c pack.writeReverseIndex=true repack -ad && + + revfile=$(ls .git/objects/pack/pack-*.rev) && + chmod a+w $revfile && + cp $revfile $revfile.bak + ) +' + +corrupt_rev_and_verify () { + ( + pos="$1" && + value="$2" && + error="$3" && + + cd corrupt && + revfile=$(ls .git/objects/pack/pack-*.rev) && + + # Reset to original rev-file. + cp $revfile.bak $revfile && + + printf "$value" | dd of=$revfile bs=1 seek="$pos" conv=notrunc && + test_must_fail git fsck 2>err && + grep "$error" err + ) +} + +test_expect_success 'fsck catches invalid checksum' ' + revfile=$(ls corrupt/.git/objects/pack/pack-*.rev) && + orig_size=$(wc -c <$revfile) && + hashpos=$((orig_size - 10)) && + corrupt_rev_and_verify $hashpos bogus \ + "invalid checksum" +' + test_done